





THE ANNALS 


of 


MATHEMATICAL 
STATISTICS 


Copyright 1930 


d Li e 
EDWARDS BROTHERS, Inc. 
ANN ARBOR, MICH 








Math. - Econ, 
Library 


HA 








EDITORIAL COMMITTEE 


H. C. Carver, Editor 

B. L. Shook, Assistant Editor 

_J. Shohat, Foreign Editor 

J. W. Edwards, Business Manager 


A quarterly publication sponsored by the American Statistical Association, 
devoted to the theory and application of Mathematical Statistics. 


The rates are six dollars per annum. 


Reprints of any article in this volume may be obtained at any time 
from the Editor at the following rates, postage included. 


Number of copies Cost per page 
1- 4 ‘ ; 2 cents 
5-24 , : 1% cents 
25-49 ; ‘ 1 cent 
50andover . 34 cent 


Appress: Editor, Annals of Mathematical Statistics 
Post Office Box 171, Ann Arbor, Michigan 











THE ANNALS OF 
MATHEMATICAL STATISTICS 


VOL. I FEBRUARY NO. 1 


CONTENTS 


PAGE 

The Annals of Mathematical Statistics . . . . ... 1 
Willford I. King 

Remarks on Regression a a a ee ee ee ae 
S. D. Wicksell 

Synopsis of Elementary Mathematical Statistics . . . . 14 
B. L. Shook 

a 
Joseph Berkson 

-. Mathematical Theory of Seasonal Indices . . . . . 57 
Statistical Department, Detroit Edison Company 

Stieltjes Integrals in Mathematical Statistics . . . . . 73 


J. Shohat 


Simultaneous Treatment of Discrete and Continuous Prob- 
ability by Use of Stieltjes Integrals . . . . . . . 95 
William Dowell Baten 


Fundamentals of the Theory of Sampling . .. . . . 101 
Editorial 


PUBLISHED QUARTERLY BY 
AMERICAN STATISTICAL ASSOCIATION 
Publication Office—Edwards Brothers, Inc., Ann Arbor, Michigan 
Business Office—530 Commerce Bldg., New York Univ., New York, N. Y. 


Entered as second class matter at the Postoffice at Ann Arbor, Mich., 
under the Act of March 3rd, 1879. 








EDITORIAL COMMITTEE 


H. C. Carver, Editor 

hi. L. Shook, Assistant Editor 

J. Shohat, Foreign Editor 

J. W. Edwards, Business Manager 





A quarterly publication sponsored by the American Statistical Association, 
devoted to the theery and application of Mathematical Statistics. 


The rates are siv dollars per annum. 


Reprints of any article in this issue may be obtained at any time 
from the Editer at the following rates, postage included. 


Number of copies Cost per page 
l- + 2 scents 
5-24 . . ‘l4cents 
25-49 1 cent 
SOandever . 3%4 cent 
Ammess: Editor, Annals of Mathematical Statistics 


Post Office Rox 171, Ann Arhor, Michigan 








THE ANNALS OF MATHEMATICAL STATISTICS 
hy 


Wittrorp T. Kinc 


For ninety-one years the American Statistical Association has 
held the van in matters statistical in the United States. At the time 
when our Association was founded, statistical method was an extremely 
simple science. In recent years, the technique has, however, been grow 
ing more and more complex. The Journal of the American Statistical 
Association has served all the members of the Association and an 
attempt has been made to cover, in its pages, all phases of statistical 
method. For some time past, however, it has been evidert that the 
membership of our organization is tending to become divided into two 
groups — those familiar with advanced mathematics, and those who 
have not devoted themselves to this field. The mathematicians are, 
of course, interested in articles of a type which are not intelligible to 
the non-mathematical readers of our Journal. The Editor of our Jour- 
nal has, then, found it a puzzling problem to satisfy both classes of 
readers. 

Now a happy solution has appeared. The Association at this time 
has the pleasure of presenting to its mathematically inclined members 
the first issue of the ANNALS OF MATHEMATICAL STATISTICS, edited 
by Prof. Harry C. Carver of the University of Michigan. This Jour- 
nal will deal, not-only with the mathematical technique of statistics, 
but also with the applications of such technique to the fields of astron- 
omy, physics, psychology, biology, medicine, education, business, and 
economics. At present, mathematical articles along these lines are 
scattered through a great variety of publications. It is hoped that 
in the future they will be gathered together in the ANNALS. 

The editorial policy will be to select articles that will best meet 
the needs of the time. There can be no questioning the statement 
that at the present time there are in this country many more who 
need stimulation in the fundamentals of mathematical statistics than 
there are individuals whose prime interest is in the advancement of 
modern statistical theary. Therefore particular stress will be laid on 
articles of a fundamental nature during the first few years of the 
life of the ANNALS. The officers, after due deliberation, have chosen 
a new method of ‘printing in order to facilitate the composition of 
original articles and the obtaining of reprints. A photographic process 
is employed, which will permit the Association at any point in the 








2 ANNALS OF MATHEMATICAL STATISTICS 


future to furnish reprints or back numbers. The advantages of this 
to libraries and classes in statistics is apparent. A particular effort 
will be made to insert from time to time tables that must be constantly 
referred to by statisticians. Nevertheless, the chronicling of research 
will in no sense be neglected. 

My personal opinion is that the advent of the ANNALS constitutes 
an important milestone in the history of our Association. {I am sure 
that this new publication will be welcomed heartily, not only by the 
mathematically trained section of our membership, but also by the 
non-mathematical group, for the latter recognize that the more ad- 
vanced phases of mathematics are rendering extremely valuable service 
in furthering the progress of statistical technique, thus aiding in the 
solution of problems of the greatest moment. 





REMARKS ON REGRESSION 


S. D. WicKseLi 


1. Ina paper published twelve years ago! I derived a set of 
formulae for bivariate regression which were found to give good 
results on unimodal materials of a fairly general nature and which 
in the case of moderately skew distributions, were reduced to very 
simple and easily applicable forms. Two years later I extended 
the theory also to the case of multiple correlations of similar 
types?. These formulae were deduced on the assumption that the 
correlation surface could be expressed by a so-called series of type A, 
i. e. that the deviations from the best fitting normal surface could be 
expressed as a series, developed according to the derivatives of differ- 
ent orders of the Bravais function, expressing that normal surface. 


When, after the lapse of so many years, I find that this the- 
ory has not received the attention which it seems to me it merits 
in view of the very simple, and on a fairly large class of curved 
regressions readily applicable results, I attribute this in part at 
least to the apparent (not actual) speciality of the assumptions 
made with regard to the mathematical expression for the corre- 
lation surface, and in part also to the rather repellent show of 
mathematics involved in the deductions. In the hope to give the 
theory a better chance of coming to the attention of statisticians, 
I] propose here to deduce some of my main results in an entirely 
different way, bringing the theory back on more simple principles. 
I believe that by this method of deduction it will be more easy 
for the reader to see exactly where assumptions come in, and 
also the nature of the restrictions caused by these assumptions. 


2. let 2 andy bea pair of correlated variates, our material 





1. The;correlation function of Type .\. and the regression of its characteristics. 
Kungl. Svenska Vetenskapeakademiens Handlinger Bd. 38 Nr 3 1917 Also 
“Meddelanden fran Lunds stronomiska Obsetvatotium™ Ser II Nr 17 
Multiple correlation and non-linear regression. .\rkiv. for Matematik, Fysik 
och Astronomi. Bd 14 Nr. 10, 1919.) \lso “Meddelanden fran Lunds .Astron- 
omiska Observatorium.” Ser. I, Nr. 91. 
3. Charlier. Contributions to the mathematical theory of statistics. 6. The cor- 
relation function of type A. .\rkiv for Matematik, Fysik och Astronomi. Bd. 


9, Nr. 26, 1914. Also “Meddelanden fran Lunds .\stronomiska Observa- 
torium” Ser. | Nr. 58, 


ty 








REMARKS ON REGRESSION 


consisting of M such pairs. Computing the means and central 
moments, we have 


sal a ax snl 4a -M,Y 
M.* 7d 0.3 Mind ¥: Mind \x-M.) ly -M,) 

The standard deviations of 2 and y and the coefficient of 
correlation are then defined by 


HK, 
F2™VHeo > Fy Moz + a 3 


Oy Fy 

Following Yule! and Pearson? we now treat the problem of 
regression as a simple problem of graduation, defining the re- 
gression of y on x as a parabola of a given degree, which, with 2x 
as argument, is fitted to the y5 by the method of least squares. 
The regression may then be written in the form 


y.- M, = a.+a, (x-M,)+ a, (x -MG) +- - -+a,(x -)’, 


and the least squares normal equations for determining the par- 


ameters @., @,, @2,°°°*a@ assume the form (Pearson Op. Cit. 
p. 25). 


O = & +QA,hMig +43Mz3p + °° +dpf{p,o 

Mi = A;Meo + 2@eghly +Ashlgo + °° + @pMpri,o 

Ma~ Goftzo t+AyMz, + Bz Heo +Q3fs5o + °° +ApMpez,0 
(1) 4 plas * 8, by +A: flan + Az hse +2shMeo t + * +Apfprs,0 


Moe,i* Ao Lp, ot a, Mpe+r,ot 2, /ép+2,0° 23 fp+s.ot rae + 2p Mep, o 





* 


1. On the Theory of Correlation. Jour. Roy. Stat. Soc., Vol. 60, 1897, and On 
the Theory of Correlation for any number of Variables treated by a new 

System of Notation. Proc. Roy. Soc., Ser. A, Vol. 79, 1907. 

2. Mathematical Contributions to the Theory of Evolution X1V. On the Gen- 
eral Theory of Skew Correlation and non-Linear Regression. Drapers Co. 
Research Memoirs Biometric Series [1]. Cambridge Univ. Press, 1905. 





. 8. D. WICKSELL 5 


Writing the solution in the form of determinants, we have 


/ 
a, ;~ A . A; ’ 
where 
/ ’ oO . Mz ° 4-430 ’ on /p,0 
Os 20 » Hao 9 Pao«** * Hp+1,0 
Hzor 30+ Haos Mso> * * ° Mpse,o 
A= 
(3) Mso> Feo > iso + Meos°  ° * Kpeo3,0 


Mp,os Kpeios Méprz.os Mpes,os’ ~ ~ Mzpro 


and A; is obtained when the i’th row in & is exchanged for the left 
membra of equations (1), i. e. for the series of elements: 


9, bus Marrsfoar°’ > °° Mpy- 

3. Some important general conclusions may at once he de- 
rived from this system. Defining as non-rcgression of the p’th order 
the case that all the coefficients @,, @,, a,,°°° @, turin out to 
be practically equal to zero, 1. e. that a horizontal straight line 
is tne best parabola of the p’th deyree that can be /itted to the series 
of y’s, i: is frst seen, fiom the first of equsiioas (1), uiat then also 
@a4,=0 . ~«<ondly we ca» draw the conclusion that this can take 
place only if all the elements 44, Me, Maps °° * * Mps, are 
equal to zero. Hence the condition for non-regression of the p ’th 
order of y on 2 is that we have 


(4) BMio=° for t=l,2,3,°""p 


This clearly involves also that the coefficient of correlation, r 
equals zero. 


Defining further as lincar regression of the p’th order the case 
that the coefficients 22,25, + + * @, are cqual to zero, i. e. that 
a non-horizontal straight line is the best parabola of the p:’th 
degree that can be fitted to the series of y's, we immediately 
see, from the two first of equations (1), that then we must have 


(5) ao=0, 89a poe 








6 REMARKS ON REGRESSION 


Referring here to the well-known theorem that any determinant 
will disappear, when the elements of two rows are proportional 
(the elements of any one row being obtained by multiplying the 
corresponding elements of another row by a constant factor) it 
is easily seen that all the determinants A; except A,, and hence 
by (2) all the coefficients 2o,-:-a,, except @,, will disappear 
if the quantities 0, Lu, Ma,» Asis **‘Me.yin the left membra of 
(1) are proportional to the elements 0, Lge, Mga, °° * “Mpuo 
in the second row of the determinant 4. Hence the condition for linear 
regression of the p’th order of y on = is that we have 


(6) Miu * Hisi,o~ Heo Kir for i= f. Zz. a + 2. 


A few considerations will show that this condition is not only 
sufficient but also necessary. For p=3 these criteria were demon- 
strated by Pearson. 


4. Thus far there are no other assumptions involved than 
the principle of least squares, and that the regression of y on x 
may be described by a whole rational function. The chief diffi- 
culty in the application of this theory of regression is that, as 
seen from equation (1), in order to determine a regression of the 
p’th degree we must compute and use moments (of the series of 
x’s up to the order 2p ). Now, as justly remarked by Pearson, mo- 
ments of high orders are, on account of their large standard 
errors, very little to be relied upon, at least in the case of ordin- 
ary materials (NM not very large). Besides ‘this, the numerical 
labor involved in computing higher moments is comparatively 
very great. Hence, Pearson’s theory of regression will be prac- 
tically applicable only in cases when the regression is at the most 
parabolic of the second degree. Indeed, this is a very serious 
restriction, because curved regressions often have at least one inflec- 
tion. Thus in order t. meet fairly frequent cases of regression we 
must needs have recourse at least to cubic parabolas. But this should 
require the computation of all the moments of x up to the sixth order. 


In order to remove, as far as possible, this difficulty, I take 
refuge in a golden rule expressed by Thiele!. Thiele introduces, 
instead of the moments, a system of coefficients called the semi- 
invariants. These semi-invariants (here denoted by Aj.) are 
defined in terms of the moments by the identity: 


1. Theory of Observations. London 1903, p. 49. 








S.D. WICKSELL 7 


4 


x 
r20 D7 4A - ‘the & 4! mene 


3 


4 
= loge (I+. te F + [Lo a +++ 


Developing, we find 
(7) A 2o= EH2o% Aso = zo Aeo™ Heo SLU oo ° 
A 50> M0 ~ 10 Uso Map? Aec= Meo~ SpéMgo 20 + 303, ” 10425, 


Now, the rule indicated by Thiele is the foilowing: 


To obtain the first semi-invariants rely enturely on computa- 
tions. To obtain the intermediate semi-invariants rely partly on 
computations, partly on theorctical considerations. But to 
obtain the higher semi-invari:nts rely entirely on theoretical 
considerations. 


Of course, this rule is just as well applicable to the deter- 
mination of moments, as any moiment may be expressed in terms 


of the semi-invariants of the same and lower order. In particular 
we have 


(8) Hzo= Azo ; H3z0— A 10% A44. = Apt 3A‘: 


Mso= Aso~ 0A 5p A 20 3 Meo Ascot /S5 Ago Ane + 15A3, +10 A}, 


5. A most natural way of applying the rule is afforded by 
Pearson’s celebrated theory of frequency-functions. The moments 
Mj,,.are the moments of one of the marginal distributions (here 
the distribution of the 2x’s). C omputing £429, i435 and Ay, in the 
ordinary way from the observations, criteria can be formed! 
showing to which of the Pearson Types the frequency curve of 
x belongs. This being decided, the parameters of the curve may 
he determined by the aid of the same moments. As the moments 
of higher order are casily expressed in terms of the parameters 
we get. in this way, £4,,and dégo expressed in terms Of Agos Mg, and 
Legos 


To state the matter in a more general way, we may use the 
formulae given by Pearson in his memoir on regression, loc. cit. 
pp. 5 and 6. 

1. See W. Palin Elderton: Frequency Curves and Correlation, London 1927, 
Table V1. 








8 REMARKS ON REGRESSION 


Pearson starts from a differential equation of the form 
(9) f(xylb,+ b,x +b, 02+ bi 34 ---)= (xe+adAzr) 


where Ac) is the frequency function of x . 


Multiplying on both sides by x and integrating by parts, he 
finds the following formulae! (placing the origin in the mean) 


(0) NSe fy.,,0¢ (2+ NB, y+ (n +2) b, Mmyot*** 
=~ nero . BhLa,o 


Now, Pearson remarks that experience shows that for the great bulk 
of frequency distributions the higher terms, multiplied by 3,, 4,, etc., 


may be neglected. In fact, Pearson’s system of frequency curves is 
obtained as a result of putting b;=Ofor s=3. 


Following Pearson’s example, we get the recursion formula, 


(//) n a +[(n+0 b,+a | Ln,o =-[(n “8 4-7 Mantis 


Putting here m=0, 1, 2, 3, we get four equations todetermine 2 , 
bo, 5,, and S, in terms of the moments sLgoy Aégs8Nd tg. This 
being done, we get 45. and 4é,,0n putting m=4 and 5. 


The procedure indicated above leads, in fact, to the theory 
of skew regression which is the natural consequence of Pearson’s 
theory of skew frequency curves. 


6. As the theory just indicated above is at present at my 
request being worked out in detail by one of my pupils, Mr. 
‘Walter Anderson, I refrain from proceeding further into the 
matter. 


It remains, however, to show how the special formulae for 
cubic regression, given by me twelve years ago, arise out of a 
somewhat similar procedure. 


Instead of starting from Pearson’s theory of frequency func- 
tions, I now start from Thiele’s theory of frequency functions. 
Just as in the preceding section the coefficients 43, 2, etc. were 
neglected in the equation (10), given by Pearson, I now neglect 
the semi-invariants A,s,and A.¢,in the equations (8), given by 
Thiele. There is no doubt that the former approximation is of 


1. See also Palin Elderton, Op. cit. p. 39. 





S. D. WICKSELL 9 


far more general yalidity than the latter; still the latter may be 
justified by the following considerations. 


Assuming the variate x to be generated as the sum of a large 
number of independent, elementary increments, each of which 
has its own frequency distribytion and its own set of semi- invar- 
iants, it follows from the theory of Thiele that any semi-invariant 
Aro Of x is the sum of the elementary semi-invariants of the same 
order. Supposing the elementary increments to be s in number and 
denoting by A} the mean value of the 7 elementary semi-invariants 
of order r we consequently have , 


A, “3A, 


Hence we get 
An, Ap | 


Tr, = — = >3 = 

o A Az st 
Except under rather special conditions, which it is not necessary 
to dwell on here, the ratios Ary ne are not extensively great. 
Thus if s is a large number we see that the “standardized” semi- 


invariants y;, of x are small of the order of magnitude oft ' 
In particular we have. 


ee 
%z of the order se 
Yeo “cc “cc “ Ss 
50 “cc “ “cc aff 
“cc “cc “cc f 
%eo “52 
We now have, denoting by 
. aie Mro 


the “standardized” moment of z , by a simple transformation of 
equation (8). 


(8) @205- / > O sg~ wes , X40 Yo +3 > : 
X50 = Vyqt 10%, 3 hee Yeot 15 %rot 10Y55 + /5 
Stopping with quantities of the order + we get 


(3) O,=/0%,3 Wo = /5%,+ 104,7 + 15 








10 REMARKS ON REGRESSION 


In practice we can, of course, not very well know if the hy- 
pothesis of elementary increments is valid, but if we have, on 


computing the moments up to the fourth order, found that 7, and 
Yeo ive rather small, and that %, is of the order of magnitude 


of 7,2, there is a certain plausibility in assuming that 7,, and %, 
are still smaller and that they may be neglected as compared to 
Veo und 7. 

The curve of cubic regression of y on x2 we may write in the 
form 


ons 2 3 
€.*4°<¢,& +¢6,€ +¢,f, 


where we have put 
7. = «~My : 4 Y-M 
V £20 V A420 
and it is evident that equation (1) now takes the form 
O = Gp #p Oh 
re +C, + Cz Ogg + Cs Ohge 
2, = Co +€, 36 + 2%, + C3 &s0 
3,= Co M50 + C, Wy + Ce s+ Cz He, 


We get 


V4) L = Oe Man ~%5,- 1) = Ogg ( Ogg — Cy, Nag Le) 
— gg (go ~ Ogg +345) + OF56 


Q, = ( O46 Feo % 50 Oyo) 2, (A.- a4) +06 5,( X50 Bae % $0) 
2 2 
PA 36 ( Ngo so Og) the, 50 ( O5_~ O96 %,¢) ~ 3, Oyo (e- Oye) 


«= ™ = 
Ln= Pee 5-H, — Og + Lolge Aso Ag” O,, ) 
d 2 
fe 2, ( 30 a Wo Aso ¥ 43 Me O 30) ° Oy { My, bo- hee + Beet a3.) 


im 2 2 2 
4, =P? (yg Beg - Bs Not Bao Ago~ eq) +He, (Age~ hee &3,) 
—~&3, (a5. ~ 3, Cao~ 5) 








S. D. WICKSELL 11 


Lg = 2 G50 O50 ~ Ogg +U%ea~ Ay) — hz, (Bao Beg Kyo B4q) 
+s, (Ago 4Z-/) 


And the coefficients are 


ae «ee ame gue 
We now introduce the semi-invariants by (8'), taking for 
,,and a, the approximate formulae (13). For. %,and a, we put 
(/5) Aesr= Ver > As) = %y,t+IP 


The coefficients %, and 7,, are then the standardized corre- 


lation semi-invariants, according to a generalized theory of semi- 
invariants for bi-variate distributions. 


It is now a consequence of our principle ot approximation 


that all powers and products 7; ,%)+%mn- +--+» Of which the 
sum i+7+k+]+m+n+.... of the indices exceeds 6, shall be 
neglected as compared to powers and products of lower order. 
Observing this, the determinants reduce to the following: 


A= /2(/-27,, + 2%,)s 
or A - 75 (+2742 - 2%) 
A,= 6(ry%,-%,), 
Az= (2r+6(ry,,- 7,,) +24 P%, - 24°76 
S2Y yy (LFoo- Yer)» 
As= -O(2%-Yen » 


Asg* -2(ry,,- Yo, + 6%, (77,,- Tar! 


Using the same rule of approximation on multiplying by «: 
we finally get 








12 REMARKS GON RELGUWNESSOON 


/ 
j* on. a). 
J 
~_ P+ 3(P 9% Yo) %eolP%e- Vers 
/ 
= -plra; %21)s 


a= -$ 7% F Ie (PI%6- 95) 


Inm cited memoir of twelve years ago 1 put! 
f aa 
?30™ 2(ry,,- %o)i LF -E OU Yi)s 
Using this notation, we get 
Lo™ Boe 
¢,=r-37n,-2%, 7, 
(/7) ad Go Jo * 


2% ~Co=-F5q 9 
<F T40* Vso 730 ° 


These coefficients are exactly the same as in equation (34*, TI 
of my former memoir. As shown in that memoir on several nu- 
merical examples, the regression formula in question applies very 
well in cases of moderately skew correlations. 


It is seen that the coefficients m and 2,, determine the 
curvature of the regression. If 7,=7,=0 the regression is linear 
(of the third order). I have called these coethcients the correla- 
tion coefficients of higher order. Ji the correlation surface is 
approximately normal we have the following fornmlac for the 
standard errors of the coefficients involved: 








1. In Pearson's notation we have 4,: ? € and r,, = é é. 





S. D. WICKSELL 13 


_[é. | 24. -/2+4r* . [e+ter® 
KdV § PWV * EY IW *%V Ww 
ae | eS ll mS. + ae me [zr : 
V8) %, 7 VN’ 5 ZN *%%,)VON * To) 2N ’ 
a eeeriers, _  fier®. . . [er 
(iJ¥  2N —SV2N ’ Is) VON 





Lund (Sweden). 








SYNOPSIS OF ELEMENTARY MATHEMATICAL 
STATISTICS* 


By 
B. L. SHoox 


Secrion [. ELEMENTARY STATISTICAL FUNCTIONS 


1. Variates. Practically all statistical data*is obtained as the 
result of observations that endeavor to establish the magnitudes of 
certain variables. The individual magnitudes that are recorded are 
termed variates. Thus in computing the average annual rainfall of 
a region, the variable is rainfall, and the amount of rainfall for any 
single year is a variate. Likewise, if the bank clearings for the City 
of New York be under consideration, then the variable is bank clear- 
ings, and the clearings for any specified interval is a variate. 

2. The arithmetic mean of a series of variates is equal to the 
sum of the variates divided by the number of variates in the series. 


{f AZ, designates. the arithmetic mean of the MW variates v,, Ve. 
V3 ? > Vow % / 
GQ) At elytyt > )= ov 


3. The mth moment of a series of variates is defined as the 
arithmetic mean of the mth powers of these variates and is repre- 
sented by the symbol 44). . Thus , 


‘ / ' 
(2) Mind plu tus tus" +> Dag v* 


That is 
fi* BEY 
Miv= Wd v* 
iy * Wd Vv" 


* 





An abstract of a series of lectures on elementary statistics given by the 
mathematical statistical staff at the University of Michigan. 

1. Observe that the number of variates in a series is denoted by WY, whereas the 
smaller italic m is employed as an ordinal number. 








B. L. SHOOK 15 
Obviously, by definitions (1) and (2) 
(3) hiv = M, 


4. The deviation of a variate from the arithmetic mean will be 
designated by the symbol _v_ ,i e. 


(4) Vi =%4-M 
5. The nth moment about the mean* is defined as the arith- 


metic mean of the mth powers of the deviations of the variates from 
the mean, and is represented symbolically by sénv. Thus 


(5) Mav =v" so that 
(Sa) My =A LV=0 

(5b) Mey =a 30" 

(5c) May =O? 


The fact that ,2,.,=0%, is demonstrated as follows: 


vy, = v,-M, 
Ve a v.—-M, 
Vy, = vy-M, 
LV =2v-NM 
oe, Ee et ott. ee a 
Hav yp N M, M, M 0 Q. E. D. 


The numerical example of Table I illustrates the definitions of 
the preceding paragraphs. The data consists of thirteen variates, 
which represent the number of even numbers found in consecutive 
blocks of 100 numbers, drawn to determine the order of call for draft- 


* For convenience the arithmetic mean is frequently referred to as the mean. 


When referring to geometric or harmonic means, the adjectives geometric or 
harmonic must therefore be specified. 





16 ELEMENTARY MATHEMATICAL ST.iTISTICS 
ing United States soldiers in 1918. These variates were obtained from 
the first 1300 drawings made. 


The most obvious conclusion to be drawn from Table I is that 
the use of fractions in determining the values of ,z,, is cumbersome. 
if M, is a whole number, then the values of v , v?and v7 are integers, 
and the procedure is simple. Generally, however, A4, will be fractional, 
and consequently awkward expressions for 7, ¥%and 7? will result. 
On the other hand, the computation of values of 447, is relatively easy, 
and hence it is expedient to express 42,., and ws, in terms of the mo- 
ments 444... This may be done as follows: 


Since by definition, 


V;=v,-M,, it follows that 
Vie vi-2@v,M,+M? , and 
= 2. 2 

Vi =, -3v;M,+3v,M3- MM? 


Consequently 


V7 =v?-2M,yv,+M; 07 =v,7?-3v,7M,+3y4M/%-M)? 





¥5 *v,-2M, «+, Vz =V2-3v2M,+34,M,-M/ 
— er .. 3 
V5 =vV7-2M,v,+MFZ V3= vs-3v>M;+3v,M/7-M; 
ar z 3 
v2 =v 2-2Mv+M? Vez Vi-3v.M+3uy,M ?2-M; 





eg 


dv*=) v? -2M Jv +NM?2 


Dividing both sides of 





Y=Sv2-3M, Sv s3MESv-NM? 


these equations through by A yields, 





respectively 
Se : 4" 2M,-M,+M; 
BF RM RP SME MM 


Hence 
Hey ee Moa Ie 


(6) anes 
Ma:v= Moy ~ SAL Meyt EM, 


261Z/0Z192Z — 


2612/0001 
L61Z/Z91Z1 
L612Z/9S99h 
2612/0001 
L61Z/LZ - 
261Z/960F — 
L612/00082Z1 — 
L61Z/AZ - 
2612/8802 - 
L61Z/Z91Z1 
L61Z/vrS 18+ 
L61Z/68£¢2 — 
L61Z/L2 - 


r 
64 


B. L. SHOOK 








691/001 
691/6ZS 
691/96z1 
691/001 
691/6 
691/9SZ 
691/00rrI 
691/6 
691/921 
691/625 
691/966z1 
691/1+8 
691/6 


L61Z_ _ L61Z:E1 _ aezy 
OrZIZ— OZ 19ZZ— 
691 _ GOEL _ vg 
‘9zsz ~ sesze 
0= et 
O9T/BE8ZE ) 


e1/ol 
eI /t7 
£1/9€ 
£1/0l 
e1/e 
£1/9l 
¢V/0z1 
eI/e 
£1/cr 
El/£Z 
€1/rll 
£1/6Z 
El/£ 





Gl ney 


Q1ZZZt 
ay = = 





8090+ I 
LL88¢1 
FOPZST 
80904 I 
TS9zeT 
000SZT 
880rZ2 
Is9ocet 
-6SOTT 
LL88¢1 
O009TZ 
6r9ZTT 
TS9ZET 





ee 


ee 





I HTHdVL 





18 ELEMENTARY MATHEMATICAL STATISTICS 


These formulae are perhaps the most important in our work, since 
they enable us to obtain the moments about the mean without requir- 
ing that we actually determine the deviations. Applying these for- 
mulae to the numerical example of Table I, 


_ 34314 (966 _ 2526 
Paw 73 13 169 





— UNS _f Hiity ie 2 (S62 | 21240 


13 \43/*2\ 437)" 7 D197 


The results thus obtained by this indirect method are identical with 
the results obtained in Table I by employing the direct method. 


7. Standard Deviation. The second moment about the mean, 
‘ew is a function of the variability of the data, since its essential 
elements are the deviations of the variates from the mean. But if the 
original variates happen to be measured in inches, then since sLg-y is 
the average of the squares of the deviations, it follows that the unit 
of f43:y is square inch. Nevertheless, by extracting the square root 
of £4z:ywe would obtain a function which would in general measure the 
variability of, and possess the same unit as the original data. This 
function is known as the standard deviation and is denoted by the 
symbol o,. Thus 


(7) Oy=V Kev ’ 


Verbally we may say that the standard deviation is defined as the 
square root of the mean of the squared deviations of the variates from 
their mean. 


Actually o, is rarely computed directly from the squared devia- 
tions, bui rather by employing the relationship given in formula (6). 
For the data of Table I 


_ [2526 _ 30.2593 


V6 ~ 13. 


3.78918 


8. Standard Units. If we assume that the arithmetic inean and 
the standard deviation of the weights of adult males are 150 Ibs. and 
20 Ibs. respectively, then we may say that a man weighing 190 Ibs. is 





B. L. SHOOK 19 


40 lbs. or 2 standard units above the average in weight. Likewise an 
individual weighing 120 lbs. may be considered as being 30 Ibs. or 
1.5 standard units under average weight. ‘Conversely, if the arith- 
metic mean and the standard deviation for heights be 67 inches and 
2.5 inches respectively, then an individual who is 2 standard units above 
the average height must be five inches above the average stature, or 
in other words must be 72 inches tall. The magnitude of an observa- 
tion expressed in standard units is therefore defined as follows: 





(8) f= “oN - 
. v 


It will be observed that these standard variates, ¢;, are abstract 
numbers. For example, if the original variates be expressed in the 
unit inch then the unit of A4,, V and gy, is also inch, and it follows 
that if both the numerator and denominator of a fraction be expressed 
as inches the quotient must be an abstract number, independent of the 
unit employed in the measurements. For instance, one series of vari- 
ates would result if the height of each of a group of individuals were 
recorded in inches. However, if their heights had been recorded in 
centimeters. each of the resulting set of variates would be numerically 
about 2.54 times as large as the corresponding variate expressed in 
inches. Nevertheless, the standard variates obtained by both methods 
would agree in the case of each individual. Thus, if 


M,= 67 ins. = 67(2.54) cms., 
and 


Oy = 2.5 ins. 2.5(2.54) cms., 
then for an individual 6 feet tall 


72 ins. 


72(2.54) cms., 
5(2.54) cms., 


V 
V 
S Sims. _5(2.54) cms. - 
t 


= 5 ins. 


2.5ins.  2.5(2.54) cms.” 


7 - 2 


With the aid of a computing machine, the series of standard vari- 
ates corresponding to any observed series of variates may be com- 
pleted very rapidly by means of a so-called continuous process. To 








2 ELEMENTARY MATHEMATICAL STATISTICS 
illustrate, we found that for the data of Table I, page 17, 


M, = 51.230769 
a, = 3.86610 


By formula (8), then 


t; = = 51230769 _ _13.2513 + .258659 v, 


In using this equation one should first subtract out 13.2513 from 
the machine, and then set up .258659 as a multiplier. The product 
of this multiplier by 51 will cause the value.2 = - .059691 to appear 
on the machine. By merely subtracting the multiplier two times, the 
value ¢ =-.577009, corresponding to & = 49, appears. Continuing 
this “build-over” method, the following set of standard variates is 
readily obtained: 


TABLE Il 





WOON A Nh WHND — 


It is scarcely an exaggeration to state that the theory of mathe- 
matical statistics hinges on standard units. Although in many prob- 


lems this might not appear on the surface, yet we shall see that the 
fact is nevertheless true. 


9. The properties of the moments of standard variates are —- 








~ See ee 





B. L, SHOOK 21 
interesting and important. Thus 


(9) hia" Mi, *¢ 


since 





zy 


/gu-M_ ! — . 
M,= =wb Ov “Na. © 170. 


(see farmula 5a) 


Referring to formula (6) we see that 


Hs = fe; - M* 
f4s= Hy - 3u,M+2M° 


But since M, has already been proven equal to 0, 


bea = Mast 


/ = 


Ms ~ (434 


Which is an important simplification in the moments of the stand- 
ard variates. 





(10) fae = / 

_ dt? fe fy-M)\_ 1 5 ¥ 
for Meee Wed (4S) gly 

= fay = | (see formula 7) 
V 
sa Mov _ Ms 
oe Mae OC! . Mey G, 
ee 

for 





We see, therefore, that although the values of ,/,, and sd. are 
always Q and 1 respectively, the value of 1/,, will possess an abstract 
value depending, nevertheless, upon the variates themselves. The ex- 
pression, 4434, is known as the coefficient of skewness and is denoted 





RET EE OE 





22 ELEMENTARY MATHEMATICAL STATISTICS 


by the symbol o;,,, i. e. 


AL ay LL aw 
12 — oo o——-—i a SOS che 
( ) Way e: ie. C; 


Summary of Section J. From the viewpoint of Elementary Math- 
ematical Statistics, we characterize a series of variates by its 


(a) number, AW, 

(b) mean, MZ, , 

(c) standard deviation, a, . and 
(d) skewness, %3., 


The moments about the meai, 44,, . are introduced solely to facilitate 
the determination of o, and @,.,. Other moments, 4,,, are used to 
simplify the numerical calculation of the moments about the mean, Any. 


Verbally, we may state that the mean serves as a convenient aver- 
age, and the standard deviation measures the concentration of the vari- 
ates about their mean. 


A thorough discussion of the significance of the cvefficient of 
skewness must be slightly deferred. We may say at this time merely , 
that the value of @;., depends obviously upon the value of ,;., and 
that a glance at the last column of Table I will lend weight to the 


statement that a positive or negative skewness indicates a weighted } 
preponderance of those variates which are considerably greater than, 
or less than the mean, respectively. 

> 


Finally, the operations of mathematical statistics, and even cer- 
tain comparisons in descriptive statistics, require that we introduce 
the notion of a standard variate, defined as follows: 


t= Vi-M,_ 


Cy 


Section JI. 


Inprrecrt MeEruop oF OBTAINING ELEMENTARY FUNCTIONS 


i = 


10. One of the fundamental theorems of muments states that if 
a constant ‘be added to, or subtracted from each variate of a series, } 
the moments computed about the mean for the revised series will be 


’ 








B. L. SHOOK 23 


identical with the corresponding moments of the original series. By 
way of a simple example: 


The mean of the following five variates is 138, consequently the 


values of 7 are as given below: 





If we subtract, say, 130 from each of the variates, then for the 


revised series 2,, Xz, Xj, Zand zs, 





The value subtracted, 130, is termed the provisional mean, and in 
general is designated by the symbol, AZ. It follows, therefore, that 


(13) r,> v,-M, 

(14) fi,* M,+M, 
1 ee’ 

(15) Ma.x N 











24 ELEMENTARY MATHEMATICAL STATISTICS 
(16) l= M 


It is understood that the functions of 2 are defined in precisely 
the same manner as corresponding functions of v , that is 


r x 
o.2 

gt x a 

Hae N 
2" 


etc. 


11. Formula (13) follows from definition, although (14)— 
seeiningl, self-evident---needs proof. Thus by (13) 
ve > M,+ x, 
Vv, = M+ x, 
% * Mm, 


% * Ps, 


“43 


Sv = NM,+)x% 
Dividing boch . es through by NV 


Vik 5, DY GE fi lui, 


M,=M+M, Q.E.D 


x 
Forinila 425.) 48 peru ed Ds means ot (13) and (14 i 
i {7 Dc fi ANS 
. 


= (vy; h,)--(h4, fY/,)\ Fovimulae 43 and 14) 
=v,-M, 


= V; QO. E. D. 





B. L. SHOOK 35 
and we have just shown that always for corresponding values 

V,~ =, 
the truth of (16) is apparent. 


12. A comparison of tables LIT and | will reveal an advantage 
of the indirect over the direct method of calculation. 


_TABLE III 





ODWMNAMNPLWN KH 


= 16 
- 13 

214 2_ 2526 
Bes™ 430 Mex Mea _M,* 13? 


6 21240 
KBs ” 13. hMys.2=L3.2 —IMs. eM, +2Mz - 


13" 
Or= (a2 = 3.78918 


———————— 
“Cu. itso 





M,= 50 + 2 = 51 3/13 
go, = G = 3.78918 
Ws.,= Os. =-.167303 











26 ELEMENTARY MATHEMATICAL STATISTICS 


Tt will be observed that the values 


_ 2526 


Ma:x 13? and bs on 


agree exactly with those of Table I, namely 


= 2526 . -~21240 


av 169 ANC Maw “9197 


The following will illustrate an important advantage of the in- 
direct method of determining the moments, “,., . Let us suppose 
that after computing the values of AZ,, o, and @,., for the 13 variates 
of Table I we desire to delete the 13th variate, v,,= 52, and compute 
the values of 7, o and @, for the remaining twelve variates. 


By the direct method of Table I, the revision would be quite 


laborious, but by the indirect method of Table III, revisions are made 
easily, as follows: 


N = 13-1 =12, Jx=16-2=14, )r®214-4= 210, 
) x%= 616-8 =608 


Consequently 
4 
MM, = “~ 
’ 210 ad _ 581 
Bex = 5 Ms:x~ Mex M; is 62 


: 608 ‘ , _ 1600 
KM s:2 “12> Mac ~ Ms: Thy. Mpt2 Ml “— 


O; = - 581 = 4.01732 


e. & —_>- 1600 _ = — 
ve ~ or Peal .114250 


M,= 50+ = = 511 


9, = CO, ™ 4.01732 


a,.,= As. = .114250 











| 


B. L. SHOOK 27 


13. Ina word, revisions of series arising from 


(a) increasing or decreasing the number of variates, 
(b) combining two or more series, or 
(c) correcting the original variates 


together with the resulting smaller numbers that result by employing 
the indirect method, lead us ordinarily to avoid using the direct method 
of section I in computing the fundamental functions, mean, standard 
deviation and skewness. 


In practice, one continually faces the problem of revision. Thus, 
in business statistics, publications serving as sources of data frequently 
are obliged to present revisions for estimates made in previous issues. 
Moreover, monthly and annua! endeavors to bring statistics up to date 
require the addition of variates to series. In problems arising in the 
field of psychology and education, it may develop after preliminary 
calculations have been made that one or more observations of the 
original series must be deleted due to the presence of factors such as 
unusual physical or mental impairment at the time of examination, 
cheating, etc. Again, we may desire to combine the statistics for sev- 
eral distinct intervals, for several classes, or for various schools of a 
city or state, etc. 


In the numerical examples above, calculations were made in 
terms of fractions, rather than decimals, in order to emphasize the 
fact that the direct and indirect methods will yield identical results. 
Ordinarily, decimals are employed, and the results will consequently 
differ slightly. 


Section III 


FREQUENCY DISTRIBUTIONS 


14. In dealing with large groups of quantitative data, the com- 
putation of the elementary statistical functions and an appreciation of 
the variation in the magnitudes of the series of measurements ‘is 
greatly facilitated by systematically presenting the data in the form of 
a frequency distribution. Such a distribution may present in tabular 
form 


(a) each different variate observed. and 
(b) the number of times that each different variate was observed 
in the investigation. 


! 
i 
| 
| 
| 
f 











28 ELEMENTARY MATHEMATICAL STATISTICS 


Tt is evident at the very outset, therefore, that if a frequency distribu- 
tion merely reproduces precisely the same data that might otherwise 
have been listed serially, the values of Af, o and a, computed from 
such a frequency distribution must correspond exactly with the values 
of 7, o and a, that would have been obtained by the serial method. 
This serial method has been considered in the two preceding sections. 


15. As an illustration. suppose that we consider the comolete 
table from which the 13 variates. «sed in earlier computations, were 
taken. Since, according to the rezulations. 17.000 numbers were with- 
drawn, we shall have 170 groups of one hundred numbers each, con- 
sequently 170 variates. These are listed below. 

We shall see that one can compute the fundamental functions from 
the frequency distribution more readily than from Table IV. Again, 
certain phenomena are apparent at a glance at Table V, though by no 
means evident from a short inspection of Table [V. Thus the range 
of the variates is immediately observed in Table V, and the degree of 
symmetry in the distribution can be guessed rather accurately by one 
accustomed to computing the coefficient of skewness from distributions. 


TABLE IV 


Number of even numbers in 170 samples of 100 numbers each. 


VU. S. Order of Call, 1918 





44 37 44 53 52 50 51 47 56 44 














B. 1... SHOOK 


The frequency distribution for Table TV may be obtained readily 
by means of the “‘cross-five” method as follows: 


TABLE V 


Frequency Distribution for Data of 
Table IV 


Tabulation 














37 2 

38 ) 

39 1 

40 1) 

41 1 

42 Ht | 7 

43 |! 3 

44 HH |i1| 9 
45 LH fit 10 3 

46 tue Li 10 

47 HH Lt 10 

48 HH Hh || 12 

49 Ht Lut | il 

50 Hi Lu Lid} 1S 

51 Ht 4H |/|| 14 

52 Lut jfi 9 

53 tH HH I! | 14 

54 Lilt diy | 11 

55 HH bt | it 

56 HH |i 7 

57 {1 4 

58 {i 3 

59 | i 

60 lil 3 

0 

1 

0 

1 


16. The above type of distribution should be differentiated from 
others ih which it has been found advantageous to combine the variates 








w ELEMENTARY MATHEMATICAL STATISTICS 
into classes and likewise to group together the corresponding frequen- 


cies. A distribution of grades will serve to illustrate this second type 
of distribution. 


TABLE VI 


Distribution of Examination 
Grades of 168 Students 


Frequency 


Total 





Such a table does not represent e.ractly the original data in which 
the grades were recorded for each student as an integral number of 
per cents; nevertheless, it gives a very good idea of the general form 
of the distribution and enables us to compute the fundamental func- 
tions with a considerable degree ot accuracy. 


17. Discrete Variates. The distribution of Table V is obviously 
one in which the variates can, from their very nature, be expressed 
only as integers. A distribution of this type is termed one of discrete 
variates, or one of a disercte variable. Common illustrations of this 
type are to be found in distributions of the number of individuals in 
a family, the number of petals on a flower, the number of coins turning 
up heads, ete. 














nh. L. SHOOK 31 


1X. Continuous Variates. In the majority of distributions the 
variates by their nature may differ by infinitesimals, and the observed 
values, as recorded, are merely more or less accurate estimates of the 
truc values, which never can be established with absolute accuracy by 
any method of measurement. Thus the variates in the case of heights 
may be correct to the nearest inch, one-hundredth of an inch, or even 
the one millionth part of an inch, etc., but theoretically it can be shown 
that the chances that any measurement of a continuous variable is exact 
is about one in infinity. A frequency table for the distribution of con- 
tinuous variates must always, therefore, be one of grouped frequencies. 


19. The fundamental differences between distributions which 
may be classified as 


(a) discrete 
(b) grouped discrete, and 
(c) continuous 


are of vital importance whenever the accurate determination of the 
mean, standard deviation, or skewness, is concerned. We shali now 
illustrate in detail and by numerical examples the procedure which 
should be followed in each case. 


20. Frequency Distributions of Discrete Variates. 


If 180 dice were thrown, and a throw of a six spot counted a suc- 
cess, then the expected frequencies of successes that would be obtained 
in one thousand sch trials are as follows: 


cee REET SEE 








32 








ELEMENTAR) MATHEMATICAL STATISTICS 





11259 


24.6863 
13.2586 


oo Ww on 


M,= 29.973, 


TABLE VII 


M, 
Maz 
Mex 
ox 

Oy fy.2= 122.655 


own 
' 
© 
N 
=_ 


uo 
a 
a 
i) 
on 
© 


By.2= 108097 





a, = 4.96853, — a@y., = 108097 











B. L. SHOOK 33 


Explanation. Since this distribution of discrete variates is an 
exact reproduction of the original data listed serially, we know that 
the moments obtained by the frequency distribution method must be 
identical with those which would have resulted had the serial method 
been employed. In fact 


of =N. 

(17) Laf = Ix, 
Laxtv=5 x% and 
Dx? fet x? 


Numerically, Lee is absolutely equivalent to > x”. However, 
Zz xf implies more; it indicates a brief and systematic method of 
attaining a total in which multiplication replaces repeated additions. 
Thus, in the serial method the value z= 5 would be added 46 times. 
during the numerical determination of 2 2 . In the frequency distribu- 
tion method one multiplication, 5 x 46, represents likewise the con- 
tribution of this variate to the total Jaf = Dx . 


If a computing machine be not available, the headings of Table 
VII should be 


TIT ae 


and the totals J xf obtained by a detailed process. With the aid of 
a computing machine the values of 2 x ?f may be obtained readily by 
a continuous process, and it is necessary to record only the totals. 


Since 


(ac t+/)P= 224+ 3x2+3x+/ 


it follows that 


8) D(x+Nf = LD xeP~Pxfr+3L afr Tf 


Formula (18) is known as Charlier’s check. By associating with 
each value, f the value of a? appearing on the next lower line, the 
value of Y(x+/)3f may be obtained as readily as that of Jr. 


Then if equation (18) be satisfied we may assume with a considerate 





34 


ELEMENTARY MATHEMATICAL STATISTICS 


degree of: confidence that all five summations have been accurately 
determined. 


It follows that we may now write, employing (17), 


(19) ue axf 
Msa* “$7 


and observe that here, as in the serial method, 


Mae Mes” M? 


Maa* tina AM thing * EM, 
M.=M,+M, oi 
Mav~ Max 
CC. 


21 The Grouping of Discrete Variates. Occasionally frequency 
distributions of discrete variates contain so many different variates 
that some sort of grouping must be employed. Thus, the distribution 


of Table VII and the numerical calculations may be abbreviated as in 
Table VIII. 


Explanation. The class mark of a class is defined as the -rith- 
metic mean of the greatest and least variates that can occr 


within 
that class. Ir Table VI]. we n 


t have used the class marks as 


values of V, but the usé of a provisional mean, as has already been 
demonstrated, saves a large amount of labor. 





seen ae 











B. L. SHOOK 35 


TABLE VIII (Unadjusted) 





Lf = 1000 M, = 30, A=3 
Daf = -§ M, = -.009 
LzFf= 2817 (ti, = 2817 
SxF¥= 405 Kye = 405 

Ms:x =2.81692 o, =1.67837 

Lyx = -481058 Oz flpg= 4.72783 


@,., = -101750 
M, = 29.973, o, = 5.03511, 34 = .101750 
The class interval is defined as the common difference between 
two consecutive class marks. In the example of Table VIII, the class 


interval has been chosen as the wit of z , consequently AZ, and o, 
are expressed in class wits. If A denotes the class interval for a 


distribution, then 
(20) M,=M,-AMz , and 


(21) o,= Ady 


Thus in Table VIII we had 


M, = 30+ 3(-.009) = 29,973 
o, = 3(1.67837) = 5.03511 








36 ELEMENTARY MATHEMATICAL STATISTICS 


Since the skewness is an abstract number, completely independent 
of the unit employed 


(22) Ms = Ks. 


22. Table IX shows in the second, third, and fourth columns the 
values of AZ, o, and &;,, which are obtained by various groupings 
of the data of Table VII. The grouping employed in Table VIII is 
listed as D (3:2) in Table IX, the 3 denoting the number of different 
variates dn each group, and the 2 designating the position of the first 
observed variate (i. e. v=15) in the first grouping. Thus the classes 
of the grouping svmbolized by D (6:4) would be 


12-17 

18-23 

24-29 
etc. 


From Table IX it may be observed that, although all of the values 
of M, agree to a rather remarkable extent, nevertheless the unadjusted 
‘values of go, reveal the fact that an increase in the class interval is as 
a rule accompanied by an increase in the associated standard deviation 
and a decrease in the corresponding skewness. 


23: In computing the moments 42/7, , 43x2,and y3., for 
distributions of grouped frequencies, the assumption is made that each 
variate in a class may be treated as being numerically equal to the class 
mark. A mathematical investigation that lies beyond the scope of an 
elementary course shows that in the computation of AZ, and 44,,, it 
is entirely legitimate to treat each variate after this manner, but the 
demonstration also reveals that grouping tends to introduce a system- 
atic error into the value of 4,.,.. To eliminate this systematic ten- 
dency we find that one should introduce a correction and write 


» f ex Lk . 
(23) Kx = Mien "a 
where 4 denotes the number of different zariates that are grouped 
together in each class. Thus, in Table VIIT we should have intro- 
duced as a correction 


32-1 


2 
12° 32° 27 


= .074074 














B. L. SHOOK Xv 


TABLE IX 


Comparison of Adjusted and Unadjusted Values of o@ and OX, 



























































(1) (3) (+) (5) (6) 
Grouping Unadjusted Adjusted 
Os 
D(1:1) 4.969 4.969 108 
D(2:1) 4.992 4.967 108 
D(2:2) 4.995 4.970 108 
Avg. D(2) 108 
D(3:1) 109 4.963 113 
D(3:2) 102 4.968 | .106 
D(3:3) 101 4.974 | .105 
Avg. D(3) 108 
D(4:1) 29.968 .103 
D(4:2) 29.976 112 
D(4:3) 29.976 112 
D(4:4) 29.972 105 
Avg. D(4) 4.968 | _.108 
D(5:1) 29.975 5.160 105 4.962 118 
D(5:2) 29.975 5.170 097 4.972 .109 
D(5:3) | 29.970 5.167 094 | 4.970 105 
D(5:4) 29.970 5.163 O85 4.966 096 
D(5:5) 29.975 5.170 .100 4.972 112 
Avg. D(3) | 29.973 | 5.166 096 | 4.968 | .108 
D(6:1) 29.974 5.247 107. | 4.961 126 
D(6:2) 29.976 5.256 099 4.97] 117 
D(6:3) 29.972 5.259 087 4.974 102 
D(6:4) 29.974 5.250 O85 4.965 100 
D(6:5) 29.970 5.251 4.966 094 
D(6:6) | 29.972 5.25 108 
Avg. D(6) | 29.973 > 108 
D(7:1) 29.977 Baa 121 
D(7:2) 29.971 5. M7 
© D(7:3) 29.972 Sd 109 
D(7:4) 29.966 5. 087 
D(7:5) 29.974 5.3 110 
D(7:6) 29.975 5 108 
D(7:7 29.976 O84 4.978 105 


Avg. D(7) 


5.35 TOR 








38 ELEMENTARY MATHEMATICAL STATISTICS 


This would have resulted in the following revision: 


Hag = 2,74285 OG; = 1.65616 
Lax = .481058 Cz fae = 4.54260 
y, = .105899 
/  Mf= 29.973, o, = 4.96848, yy = 10580 





Again, for 4 = 7 we would use 
x wie Cn os = (4-2 M-— 
2x az z 12°72 2x = AQ 


When the simple adjustment of formula (24) is made, Table IX 
shows that the systematic errors in the values of aq and @,, . caused 
by grouping, are eliminated. Thus in columus 5 and © the averages 
for each group are constant, consequently the errors remaining arc 
accidental variations, which, due to a complete lack o4. compensation, 
still remain, but such discrepancies are not serious, 


It shonld be noted that for distributions of discrete variates in 
which no grouping occurs, as in Table VI, the correetion vanishes. 


since for £=1 


(2+) l- Wk . 0 


a 


The fol- 


24. Frequency Distributions of Continuons Vartates. 
lowing will serve as an illustration of the method of obtaining the 
fundamental functions for a distribution of continuous variates. 

















Bb. L. SHOOK 39 


TABLE X 


Weights of 1000 Female Students | 


(Original Measurements Made to Nearest 1/10 Ib.) 









Class 


M,=//4.95 
(Pounds) 


x 

















70- 79.9 74.95 -4 
80- 89.9 84.95 ~j 
90- 99.9 94.95 «lj 
100-109.9 104.95 1 
110-119.9 114.95 | 248 0 
120-129.9 | 124.95 196 1 
130-139.9 134.95 122 2 
140-149.9 | 144.95 63 3 
150-1599 | 154.95 23 4 
160-169.9 164.95 5 5 
170-1799 | 174.95 7 6 
180-189.9 184.95 1 7 
190-199.9 | 194.95 2 8 
200-209.9 204.95 1 9 
210-219.9 214.95 1 10 
Total, 1000 
D)f = 1000 M, =114.95 
dxf = 379 M, = .379 class units 
Lx7= 3089 bie = 3.089 
JxF= 8131 ui, = 8.131 
Mea = 2.86203 Ox = 1.69175 
Mg: = 4.72769 Ox Mz 4.84184 
A ae = 976424 
M, = 118.74 Ibs., o, =16.9175™ Ibs., W 5., = 976424 





Explanation: The class mark has previously been defined as the 
n ot the greatest and least variates that can be included in a class 
Since the original measuremtns were made to the nearest tenth of a 
pound, the true limits of the 150-159.9 class are 149.95-159.95, and 












40 ELEMENTARY MATHEMATICAL STATISTICS 













their mean is 154.95, which accordingly is the class mark in this in- 
stance. If the original measurements had been made to the nearest 
pound, then the classes would be written 


ew www 


150.0-159.0 
160.0-169.0 


we ewe ee ow ee 





and the true limits of the 150.0-159.0 class would be 149.5 and 139.5 
pounds respectively, and the corresponding class mark would be 144.5 
Ibs: It is apparent, therefore. that a table of continuous variates 
should specify clearly the accuracy with which the original measure- 
ments were made. for the values of the class marks and consequently 
that of the mean, hinges on this point. 












It will be noticed that in this example the class interval has again 
been taken as the unit of 2. and this fact must be taken into consid- 
eration in determining the value of 24 and o, . 











Since the assumption is also made that the class mark may repre- 
sent the magnitudes of all variates occurring in that class, the ques- 
tion of correcting the second moment, /¢2., again arises. Since in 
each class of a distribution of continuous variates an infinite number 
of different variates may occur, the correction is in this case 









Lr 
fzZ 12 
Therefore, corresponding to formula (24), we must write, in order 


properly to adjust the second moment of a distribution of continuous 
variates 


(2 


cn 


/ / 
) Bax Max Mz 75 





As before, neither the values of My nor figg Vequire adjustment. 








Summary of Section 111. Vhe frequency distribution is a device 
for presenting an extensive series of variates in a systematic and com- 
pact form. Not only are the phenomena of aggregation more readily 
perceptible by this method of presenting the data, but the calculations 
of the fundamental functions are facilitated. 


The formulae for obtaining the mean, standard deviation and 
skewness are, with the exception of a single adjustment that may 


B. L. SHOOK 41 


arise, identical with those employed in the serial method. One need 
only observe that 


N =)Df 
dx=Ixf 
Set 
).x'= dxf 
The adjustment referred to is that we should in general regard 


Kax = Mee Ms -£ f F 


- 12 


For ungrouped distributions of ae variates this correction 
vanishes, since in this instance A = 1. For distributions of continu- 
ous variates, since here A would a infinity, the correction is 
numerically equal to 1/ 12. 


These corrections will remove systematic errors in the standard 
deviation and skewness that arise from the phenomenon of grouping 
complete frequency distributions. 


Editor's Note: This abstract of Elementary Mathematical Statistics will be con- 
tinued in the May issue of the ANNALS. 

































BAYES’ THEOREM ' 


By 


JoserH BERKSON 





As for all established sciences, the typical problems ,of practical 
statistics have become inveterately attached to their several neat and 
convenient formulary solutions. To recall consideration of the basic 
reasoning underlying every-day statistical practice that applies to an 
elementary question may appear in the nature of an unnecessary dis- 
turbance of prevailing peace. If the experience of the writer is typical, 
however, vagueness or dubiousness of the premises inherent in a rule 
applied by rote will emerge to plagué one in the conclusions, and a 
periodic return to fundamentals is as salufary for mental comfort as 
for the integrity of science itself: In what follows, an attempt will be 
made to go over the ground covered by Bayes’ Theorem, and to point 
out its import for sound statistical reasoning. No claim is laid to 
mathematical originality at any specific points, but in the approach and 
synthesis will be found, we hope, a measure of instructive novelty. 


A large class of statistical problems is typified in the following. 
A standard machine is known, from long experience, to produce a cer- 
tain fraction P of imperfect products. What is the probability that 
in the next issue of » products, a fraction p will be imperfect? 





We now present a related but not identical question. There is 
no available knowledge concerning the general practice of a machine; 
nm products are examined and a fraction p found to be imperfect. 
What is the probability that the machine turns out generally a fraction 
P of imperfect products? The distinction between the two questions 
may be schematized as in Figure 1. 


1. From the Departyent of Biometry and Vital Statistics of the School of 
Hygiene and Public Health (Peper Mo. 125); and the Institute for Biological 
Research of the Jains Hopkins Unversity. 


J. BERKSON 


¥ 
Q\ ) 
\ || \ 
¥SHS TS SELDY 


P P2 P; 2 Ps Ps P; Ps Ps Pre P, Pz Ps Ps Ps Ps 


The values P , 8; B; ZB represent serially all ihe \ arious ‘frac: 
tions of imperfect products which might characterize particular ma- 
chines, each one, let us say, determined by some definite combination 
of mechanical defects. Values p , 2, 2,. ete., are the fluctuating 
fractions of imperfect products that might appear in the samples pro- 
duced by these machines. Connected by arrows with A are the ran- 
domly varying values of » that might result from 7? , with 2 those 
that might result trom Fe , etc., the weight of the arrows being pro- 
portional to the "pouladilllay oi the particular » concerned. It is to 
be noticed that each P may give rise to any of a number of p’s and 
that some of the p’s may result from any of a number of P’’s. 


The first question in terms of the diagram is: “Given 7 , how 
probable is it that ~, shall result?” The second is: “Civen p, , how 
probable is it that R has been its source?” Answering the first, we 
calculate in the realm of the p’s connected with P . In the second 


we calculate in the realm of the P ’s connected with zp, . 


An answer to the first is given directly in terms of our every-day 
statistical reasoning. We say that the »’s which result from #__can be 
adequately described as a normal distribution with o= — 
and from this the probability of any particular p calculated. The 
answer to the second is more difficult, and was given in general terms 
first by Bayes (1) in the theorem known by his name. Bayes’ Theo- 
rem is ot frequently used in applied statistics; yet the problems that 












44 BAYES’ THEOREM 


arise in practical situations would often seem to demand just such 
an answer as it provides. More often than not do we have a specific 
sample and inquire about the probable character of the universe from 
which it was drawn, in contra-distinction to the situation in which the 
universe is known, and the questions concern the possible samples, 


The method of presenting the theorem here given will not follow 
rigidly any historical demonstration. Actually the calculation quan- 
titatively of an “inverse probability” or the “probability of causes,” 
was first given by Bayes. But he considered a purely geometric set-up 
and his solution was in terms of this conception. By implication he 
utilized a general principle first clearly stated later by Laplace, and 
furthermore, Laplace generalized the solution still more by arguing 
from the probability of a cause given by a particular sample, to the 
probability of the next sample. With this realized, then, that Bayes is 
to be credited with the original demonstration and Laplace for an im- 
portant extension, we may proceed to a demonstration which is not 
exactly that of either. 


I. Problem. We have an urn containing three balls. Each ball 
is colored black or white, and each color is equally likely. We draw 
one ball and it is black. What are the probable contents of the urn? 
We argue—the following are the possibilities: 


I II Ill IV 
www wwb wbb bbb 


All of these possibilities, we say, are equally likely a priori and 
we have for the probabilities of the sample the following: 


B I, the probability of a black sample from I = 0 

Z Il, the probability of a black sample from II = 1/3 

EZ ill, the probability of-a black sample from III = 2/3 
B IV, the probability of a black sample from IV = 3/3 


where & I is the probability of the sample 3 being drawn from urn 
I, R11 from urn II, etc. We say now that the relative probabilities 
of the various urns are in proportion to the probabilities of the sample 
drawn, and we have 


(a) PI: PIU: PUL:PIV =0:13:24:3/3 














J. BERKSON 45 


where F | is the probability that, having drawn the ball, urn I was its 
source, PII that urn II was the source, etc. 


Also, since the ball must have been drawn from some one of the 
urns, the total probability of one or another of the urns is unity and 
we have 


(b) PI+PuU+Pul+PIV=1 


From (a) and (b) we have therefore 


PI = 0 
Pil = 1/f 
PIl = 24 
PIV = 36 


We now extend the problem to the case where the a priori prob- 
abilities of the various possible urns are not equal. 


Suppose we say that there are many urns of the description I, 
iI, II], IV in a large chamber, and that these are in proportion 
1:11 :11:I1V= 1:2:3:4. We now pick an urn at random and 
draw from it a ball, which turns out to be black. What is the prob- 
ability that the‘urn is of some particular description? Proceeding as 
before, we have for che probabilities of the sample being drawn from 
the various urns the following: 


Psi = 1/10x0=0 (Probability of urn x probability 
of sample) 

Pll = 2/10 x 1/3 = 2/30 ? 

BP, Ill = 3/10 x 2/3 = 6/30 

Dp, 1V = 4/10 x 3/3 = 12/30 


where y, I is the probability that such a sainple 9 be drawn from 
urn I, ete. 


And again on the principle that the probabiliues of the urns are 
in proportion to the probabilities of the sample drawn, we have 


P1i:PI:PUL:PIV = 0:2730 : 6/30 : 12/30 


and as preceding 












BAYES’ THEOREM 
Pi+Pil+fPiit+ PIV =. 


Therefore 





Pi 
Pu 

Pill 
PIV 


0 
2/20 
6/20 

12/20 












We shall now generalize this solution. 




















Let 1,, 7. 73, etc. be the a priori probabilities of the various 
possible universes from which a sample is to be drawn. Let > , B,, 
Ps » etc., be the probability of the sample being drawn from the re- 
spective universes. Then, a sample s having been drawn, the prob- 
ability that its source is universe 7 is given by 


17, Pr 
SS <u 
iP 


If all the universes are equally likely (our first case above), 
7%] = #%T, = MT]; = 7, and we have 









Pr 
LP? 


If the equally probable universes are infinite in number, the P ’s 
varying by infinitesimal gradations from zero to unity, and p may 
assume any positive value less than 1, we may extend the last for- 
mula (1) by use of the calculus as follows: 


(1) £, = 








Let x =any possible P between QO and 1. From a universe x 
I draw a sample containing r+ s individuals, designated hereafter 


as a sample (r,s). The’probability that it will contain r successes 
and s failures is given by 









Pir,s) = £5 27 (1/- x)" 


zs 





where Faas is the probability that the sample ( 7,5) 
coefficient of the (7+/ )th term in the Bernoulli expansion = hy 


The probability of the sample of (r,s) coming from a universe 


J. BERKSON 


the P of which lies between x and ( x + o&) is therefore 


xtdx 


ann” as x(/-2xr)'dx 


xt+dx eq: ’ 
where x “tr, 8) is the probability that the sample (7, s ) 
emanates from a universe whose FP lies between x and (x +d ). 
If the universe from which the sample is drawn may have a P any- 
where between @ and 3, the probability of the sample ( 75 ) is 


b 
6 
(2) atin wo £,,,/* (/-x)?dxr 


and the probability that a is between a and } is therefore as in (1) 
b 


(3) by : [2°-2¥ te 
J xV-xPdx 
o 


2 


b. . “ee . . 
where ,/ is the probability that the universe irom which the sample 
(r,5 ) was drawn has a P between 2@ andb. This is Bayes’ Theo- 
rem in terms of the integral calculus. 


Now, we ask the further question, what is the probability of a 
second sample containing # successes and 7 failures’ being drawn? 


If x be the » of the universe frou: which the sample (m, n ) 
is drawn, and if P may vary trom 0 to 1 we have analogously with (2) 


(4) ; 
= 2 mm - “é * 
ofim,n) = — m,n s+ (/-x) dx 


/ . “qe - 
where FZ,,,)is the probability that a sample (m,2) be drawn from 
a °o 
universes whose P’s vary between 0 and 1, aud 


B= (m+n)! 
: mint 


1. Designated hereafter as the sample (im, »). 





BAYES’ THEOREM 


The probability of the event (m, 2) occurring from any par- 
ticular universe is given by the product of the probability of that uni- 
verse and the probability of the event. The total probability of the 
event (7. 7), i. e., the probab'lity that the event (#, ) occurs at all 
from any universe, is, therefore. given by the product of form (3) 
with 0 and 1 substituted for 2 and &} and (4), as follows: 


(5) 


/ 
frm -2 x 
° 


P (m+n) 


(m,n) <r, g)~ a 


fx (/-x)8dx 


where Fix, ny-(r,s) is the probabiliy of a second sample ( m,n) 
after a first sample (7. 5) has been drawn. 


This is Laplace’s extension of Bayes’ Theorem, somewhat 


modified. 
Bayes’ Solution. 


It will be illuminating to derive this result by the method of Bayes. 
We shall follow his proof except to simplify his notation and to use 
the integral calculus where he used geometric demonstration. 


A ——___—_—_B 





J. BERKSON 49 


ABCD is a square billiard table. A ball is thrown and comes to 
rest at a’, through which a line is drawn parallel to AC . A second 
ball is thrown; if it stops to the left side of the line a’, we designate 
success, to the right, a failure. Before the first ball is thrown, what 
is the probability of the second ball succeeding 7 and failing s times 

in r plus s trials? 


second throw is il =p , and of failure =q. The probability 
of 2 successes and s failures with the first ball at x is then 
(r4s)! r_3 
ris! ‘ 


If the first ball -opae to rest at 2’, aes. of a successful 


Let us erect at each point 2‘along CD a distance y’, so that 


(6) ‘ ipa? . 
a “plist PS 


and connect the summits forming a figure as shown in Figure 2. At 


each point, of course, y ‘will be different because p = ae , and 
g= will be loon. but for any particular case, rand s 
remain Sesutens. 


The probability that the first ball shall fall between a and ( a+dx) 
is a5 and that the second ball shall therefore succeed » and fail s 
times is eS . That both shall happen is therefore 
dx’ 


aE 


and if 2 is to be between a’ and 5’, the total probability is 


b ae sie 
ahr, 0)™ ayy ox, 
a’ 


b 
where a P. r, 5) is the probability that the first ball fall between a’ 
and 4 and that a ball thrown subsequently 7°+s times, succeed 7 
and fail s_ times. 


But CD*= Area of AD and i y‘dx'= Area of the shaded 
portion, a’J4'. Therefore a’ 











50 BAYES’ THEOREM 
(7) *D _ Area aJb' 
a’ 9 Area AD 


The probability that the first ball fall between C and D and 
thereafter there occur 2 successes and 5 failures is similarly = : we 
But the iirst ball must fall somewhere between € and JZ; therefore 
the total probability of the second throws having r successes and s-. 


failures is given by 


a « Area CJD 
(.9” Area AD 


With this established, the analysis proceeds. 


(8) 


Given the result of a series of throws to be » sugcesses and 3 
failures, what is the probability that the first ball has fallen between 
a’and 6‘? This we may obtain by the use of the solution already 
derived and the principle of compound probability’. 


Let x be the desired probability that the first ball fell between 
a‘and 5’. We have seen that the probability of r successes and s 
failures in the second series of throws is 


Area CJD 


Area AD from (8) 


therefore the probability of the first falling between @’ and 5’ and 
the experience (7, $) following is 


es Area CJD 
re2 AD 
But we have shown that this combined probability is equal to 
Area aJb' 


Area AD from (7) 
Therefore 


(9) oa Area aJb’ 


~ Area CJD 





1. This step is very elaborately proved in Bayes’ original paper by a circuitous 
demonstration. 








J. BERKSON 51 


This is Bayes’ Theorem, as its author gives it. The additional 


part of his work is concerned with the quantitative estimate of the 
ratio. 


We may now show that his solution is the same as that given in 
(3). as follows: 


(= y= CDx Enda) (-3) from (6) 


where 


x*=distance from C to 2’ 
_ (r+s)! 
E,,,2 ‘+3 


ris! 


a=ax CD 

b'= bx CD 
@ and 4 having the meaning of equation (3). Assume the relationship 
(11) ; x'=CDxx 
(12) Jx'=CDxdx 


Then 


x2’sp’ 
Area aJb =f y'dx 


xz*sa’ 
g-6 


= CD*xE,, 5 f x~“(/-x)*dx 
x: 


(Substituting from (11) and (12)). 


Similaly 


Zs 
Area CJ/D=CD*xE,, [xt rey dx 


toa 

















52 BAYES’ THEOREM 








Therefore 


b 
J x'i-x)¥ dx 
add -@ 


Area CID Sagppneneemenn 
J zt -z)dx 


which is the same as formula (3) previously derived. 





To be directly applicable to statistical problems formula (5) must 
be numerically evaluated. This is accomplished exactly for most prac- 
tical instances only with a great amount of labor, and methods of 
approximation have been resorted to. For a few simple special cases 
the solution may be easily derived as follows: 


An event has been tried NM times with p successes and q failures. 
What is the probability that in the next single trial it will succeed ? 








Applying formula (5) to this instance, we have 


r=p maf 














S=q n = O 





and the desired probability is given by 


TE: Pt f-20)*dee 
° 


P= 





z?(s-x)*dac 
°o 





‘ 


F \ 4b! 
J x?(t-x)® dx = —F 
4 (a+b+/): 


- 





From which we have 













- _Imt+l ss mitt 
mt+n+2 N+2 





So that if nothing is known concerning an event except that it has 
been tried three times and succeeded twice, the probability that it will 








J. BERKSON 53 





succeed in the next trial is 3/5, not 2/3 as the more usual procedure 
would indicate. Again, if an event has occurred a thousand times 
without a failure, and we know concerning it nothing except that fact, 
the probability that it will fail next instance is 1/1002. If an event 
has never been tried at all, the probability that it will succeed on the 
first trial is 1/2. 









An event has been tried WV times and succeeded each instance. 
What is the probability that in the next d trials it will again succeed 
each time? Here 







r=N mr=d 





e =e n=0o0 










and the desired probability is given by 
fx 4x 


/ x“ox 


N+/ 
N+d +/ 


r 











From this we conclude that if an event has succeeded 25 times 
and never failed, the probability that in 25 further trials it will again 
not fail even once is 26/51, or in general if an event has never failed 
inZV¥ trials, the probability that-A¥ further trials will yield no failure 
is about 1/2. 










Discussion. 






To precisely what position in the methodology of applied sta‘isti¢s 
Bayes’ Theorem will eventually become adjusted, it is impossible at thi: 
point in its development to say with certainty. The literature on the 
subject, as soon as it leaves th-srealm of purely hyputhetical situations, 
is rife with disagreement, and clarification remains a contempo’ary 

‘problem. In this brief presentation, no attempt can be made to ade- 
quately summarize the various views concerning the questions at issue. 
We may, however, consider a few points that have disciplinary value 

for statistical thinking rather than any immediate practical utility. 











It is basic to the aims of statistical calculations to estimate the 





54 BAYES’ THEOREM 


probability of given experiences from assumptions of pure random 
variation. A consideration of the logic involved in the development - 
of Bayes’ Theorem is useful in bringing out the inadequacy of the 
reasoning by which our most ordinary statistical procedures attempt: 
to accomplish this. If, having observed a probability p , we esti- 
mate the standard deviation of succeeding samples of » by yar ‘ 
we imply tacitly that in the universe from which the sample was 
drawn, the chance of a success is the p of our observation. The rea- 
soning leading to, and formula (3) itself, indicate how unwarranted 
this is. Our knowledge of the universe which generated the sample is 
never given with certainty by the sample. Indeed, formula (3) states 
a probability for any particular universe that may be assumed. With 
only a sample as the source of knowledge, and without Bayes’ Theorem, 
we have no clue as to the nature of the generating universe. but, 
if we do not know the universe, how are we to calculate the character 
of its samples? One answer is to take refuge in formula (5), i. e. 
use Bayes’ Theorem. As a practical solution of the difficulty this has 
two major objections: first, there are no existing tables for making 
the necessary calculations without prohibitive arithmetic labor ; second, 
even if the evaluation could be effected there are reasons to doubt 
the validity of its application. For the formula in question rests on 
the assumption that all the probabilities from zero to unity which might 
characterize the universes from which we draw samples are a priori 
equally likely, the socalled assumption of the equal distribution of ignor- 
ance. Now this is an exceedingly questionable assumption, and it is 
partly on these grounds that Keynes rejects outright the possibility 
of applying probability to actual experience. It must be admitted, we 
think, that it is difficult to see what there is to justify the assumption 
that every sort of general universe from which arise the events of 
experience is equally likely. Would it not appear the more reasonable 
hypothesis that these universes are themselves “events,” samples of 
some larger universe; and why should this be extremely different in 
the distribution of its probabilities from the universes that we ordin- 
arily meet? There are writers, hawever, who, admitting that the as- 
sumption is to be questioned, believe it may be subjected to experimental 
test, and have essayed to actually sample at random the probabilities 
that characterize the universes of our experience. It would be im- 
pertinent to assert that an experimental investigation is bound to be 
futile, but the utility of this sert of procedure seems to us exceedingly 
dubious. We doubt indeed that any clear meaning can be assigned to 
the concept of “the universes of our experience,” of which random 
samples are to be obtained. But granting the existence of such a 














J. BERKSON 55 


distribution of a priori probabilities we doubt the relevancy of its 
estimation to any practical problem. . In any actual investigation, we 
deal with a definite slice of possible experience; an anthropologist is 
not concerned with the universes dealt with in the investigation of an 
economist or an epidemiologist. If a priori probabilities are of inter- 
est to him, they are those that obtain in his peculiar world of observa- 
tion. It appears to us quite as wide of the mark aimed at, to call in 
a formula which obtains its a priori probability from experience in 
general, as to obtain it from the unique experience at hand, and indeed 
it may be argued that, as between the two, the latter is the more 
reasonable. 


What then does all this come to? Does it mean that the entire 
structure of established statistical procedure rests on quicksand, to be 
toppled over by anyone armed with a reading of Bayes’ Theorem? 
We are inclined to the belief held by Keynes that, so far as logic is 
concerned, this is substantially true. As regards this, however, it is 
at bottom in no worse plight than any current scientific procedure 
when its fundamental assumptions are hard pressed. But we do not 
rest the matter here. All this admits is that applied statistics, like 
all applied science, is not founded on unquestionable premises and in- 
vulnerable logic. It is perfectly consistent to add that in general its 
formulae are good approximations. How good? This is a question 
permitting no dogmatic comprehensive answer. Differently good for 
different situations. Some idea of the degree of approximation may 
be obtained for given assumed conditions by direct calculation. It 
may be shown, for instance, that under certain conditions results ob- 
tained by way of Bayes’ Theorem or the more usual “normal” dis- 
tribution render not very different results, and these conditions, indeed, 
approach the ones we most frequently encounter. But, in general, a 
more satisfactory answer is furnished in the pragmatic consideration 
that our formulae have in fact been widely used and experience has 
not violated their anticipations. This is the fact that we would stress, 
because it throws into relief the experimental as opposed to the math- 
ematical ‘foundation of statistics. Comforted on the one hand that 
experience in general supports our procedures, the considerations we 
have elicited in this discussion will emphasize equally their shifting 
approximation. The clear minded and careful worker will keep this 
constantly in mind and shun literal interpretation of conclusions drawn 
from for.uiulae applied to extreme cases. No scientist worth his salt 
will permit himself the use of formulae the premises of which he has 
not examined. But the statistician, because of the great variability of 











56 BAYES’ THEOREM 


the data with which he is likely to deal, stands in special need of this 
precaution. Where statistics run counter to what appears to be the 
general experience, it is a wise rule to re-examine the statistics rather 
than to indict forthwith the dependability of the experience. Such an 
attitiide would modify considerably much that is found in current 
statistical literature and it would modify it in the direction of greater 
soundness. 


REFERENCES 
1. Bayes, Thomas. Phil. Trans., 1763, LILI, 370; 1764, LIV, 296. 
2. Coolidge, Julian L. Probability, Chapter V1. 
3. DeMorgan, Augustus. An Essay on Probabilities, Chapter III. 
4. Keynes, Maynard. Treatise on Probability, Chapter XXX. 
5. Pearson, Egon. Biometrika, 1925, XVII, 388. 
6. Pearson, Karl. Phil. Mag. 1927, 13, 365. 
7. Todhunter, I. A History of the Mathematical Theory of Probability, Chap- 
ter XIV. 
8. Wishart, John. Biometrika, 1927, XIX, 1. 


4 








A MATHEMATICAL THEORY OF SEASONALS 


By 


STATISTICAL DEPARTMENT, DetroIT Epison Co. 


The graph of any time series may be assumed to be a compound 
curve which is dependent upon the following factors: 


Secular trend, fix) 
Cycle, c(x) 
Seasonal Sx) , and 
Residual errors, €, 


If we designate the xth term of the observed time series by 
ode» WE have that 


(1) 3 Yq" Six lx) slike, 


It also follows that the standard error, based on our hypothesis, is 


(2) g,=\| 52 


In making predictions, we desire that the standard error of esti- 
mate be a minimum, and this requires that 2 €* be also 2 minimum. 


In dealing with data covering a period of years, i. e. 12 2 months, 
we observe that 














4A MATHEMATICAL THEORY OF SEASONALS 


de*= [.y,-f(/)-c/)-s(a]* 
+[,¥,-f(2)-cl2)-s(2)* 


+ [ ga fl2)-cV2)- Sa) ? 
+[ .%-SV3)-d/3)-st/)]? 
+! y-W4)-d/4) - A2)]? 
+ [ Seneurf len-M-cllen-HYsN) } 
+ [ Yona SUZ n-/0)¢ U2 n-10)9(2)] 
+ [owen £2 n)-cl2n)-8(/2)] 
Let us now find the values of 3(1), 5(2), . . . $(12) that 


will minimize the standard error of estimate. Placing the partial de- 
rivative of 2 ¢* with respect to s(1) equal to zero, yields 


527 = [oy Pc AEF 
+2[. y,.-fU3rc(/3)3(/)|[-FU3)-c(/3)] 


$0. ramen fULR-INEM2 n-11)-S EF U2 n-11)¢ V2 n-I0\F0 


Solving @ 
a(/)= Leds FO-c@) 
F tteyetx) _ 





DETROIT EDISON CO. 59 


a) 
where we understand that > ,y. -Ax) C(x) means the sum of 
the products of 4 Yzx , fix) and c(x) taken from the first month 
of each year, and similarly for z F(&)-cbx) 


The partial derivative with respect to $(2) yiel 
(2) 


s(2)= 4 ae Fae * J (XC(X) 
5 fix).cbe) 


and in fact ; 
(a) 


(3) s(7)= ao Y_°f\2) (2) 
f bx) -¢ xe 


Thus the seasonal for July is a function only of the various July 
values of the observed series, the secular trend and the cycle factors. 


Since both f(ac) and c(x) are smooth functions, it follows that 
their product, which we shall designate by y (ac), represents a smooth 
function which is merely that part of the time series which would 
remain if the accidental and seasonal fluctuations were eliminated. The 
formula for the seasonal index for the i th month may therefore be 
written 


57 
Sy - W(x) 


(4 (i)= 
° Ee 


At this point we may recall the fact that in fitting a curve of the 
type y=AY¥(x) to observed data by the Method of Least Squares, 


in DX. ¥z: ¥(=) 
d& ¥ ix) 
whereas if the Method of Moments be employed 
~ 
A = 
2 Wx) 


Experience in various statistical applications demonstrates that 
the two methods yield approximately the same results. Borrowing 
from this experience, we shall choose the simpler form and write in- 











60 A MATHEMATICAL THEORY OF SEASONALS 


stead of formula (4) 

@ 
( 5 ) s(z) = Js 
5 ~(x) 


So far as theoretical considerations are concerned (4) may be 
superior to (5), but the fact that the latter formula enables us to 
obtain seasonals by a method far simpler than would result by using 
formula (4), requires that we choose (5) ‘n preference to (4). Ordin- 
arily the difference in results obtained by using both formulae is less 
than one-half of one per cent. 


Verbally, formula (5) states merely that the seasonal index for 
any month is the ratio of the total of the variates for the month in 
question to the total that would have been experienced if neither acci- 
dental nor seasonal influences were present. 


a, We now are forced to find a simple method of obtaining values of 


& (xz). 


Let Ti-5 ? Fenn ? T,-, ° T; ? i ’ Tyy2> and T ses denote 
the total production for seven consecutive years. If we assume that 
the effect of both seasonal influences and accidental or residual fluc- 
tuations is to shift the production from one month to another, but 
nevertheless to leave the total production for each year practically 
unchanged, then a smooth curve passing over the seven year period, 
and preserving the annual totals, may be assumed to afford a repre- 
sentation of (2). We, therefore, determine the equation of a par- 
abola of the sixth degree in such a manner that the areas under this 
curve for seven equidistant unit intervals are equal respectively to 
TyoTy-2 9° °° Geet Tisgs Tigg  - Fitting six degree parabolae to suc- 
cessive seven year intervals it is possible to deal with a time series of 
any length. , 





By adding together the interpolated values for all the January 
values of (x), and similarly for the other months, we can show that 
i) 


(6) Zz Wix)=C,,, T+Ce,; EetCus t4* 4:4 [ T+ a, oe Ty.» | 


+ Cs.3 a | I, 

















DETROIT EDISON CO. 61 
where the values of the coefficients are as given in Table I. 


In order to compare the efficiency of this method with another 
method of computing seasonals, it is necessary that each formula be 
tried out on some series for which the true values of the seasonal indices 
are known. We know in advance, of course, that there exist many 
satisfactory methods of obtaining seasonals, but we also desire to know 


something about the amount of time that each method requires as 
well as their relative accuracy. 


The theoretical series, on which we shall try out two methods of 
computing seasonals, is built up from data taken from an article, 
“Statistical Analysis and Projection of Time Series,” written and pub- 
lished by the statistical division of the American Telephone and Tele- 
graph Company. After eliminating from the Production of Pig Iron 
series both trend and seasonal influences, the factors of Table II re- 
mained. We shall consider these, therefore, as the combination of 
“Cycle and residual” factors. 


Although smoothing this data by a proper mathematical formula 
would eliminate the residual errors, nevertheless such procedure would 
introduce a bias in favor of the formula for computing seasonals that 
is proposed in this paper. The reason for this bias lies in the fact that 
most smoothing formulae are developed on the assumption that the 
smoothed ordinate lies on a parabola of a chosen degree, and since 
a similar assumption was made in our theory, it is evident that the 
proposed method will benefit most by employing a parabolic smoothing 
formula in obtaining the hypothetical cycle series. 


For this reason the data of Table 1I, with additional data for 
one year on either side, was given to a draftsman with instructions to 


(1) Plot the data of Table II 


(2) draw free hand a smooth curve that to his mind best rep- 
resented the general run of the data 


(3) ‘read off from his curve the approximate value of the 
smoothed statistics. 


The data of Table III resulted. 


In essentia! agreement with the American Telephone and Tele- 
graph article, we shall assume a linear trend, the value for the first 













62 A MATHEMATICAL THEORY OF SEASONALS 


month being 1511 and the monthly increment 8. The product of trend 
by cycle produces the theoretical values of y (2c) presented in Table IV. 


TABLE I 


Constants for computing seasonal indices 


COON AU AWN = 





DETROIT EDISON CO. 63 


TABLE II 


Cycle and Residual Series for Pig Iron Production 

















January 
February 
March 
April 

May 

June 

July 
August 
September 
October 
November 
December 

















January 
February 
March 
April 

May 

June 

July 
August 
September 
October 
November 
December 










A MATHEMATICAL THEORY OF SEASONALS 


TABLE III 


Per Cent Cycle Series for Theoretical Distribution 





DETROIT EDISON CO. 65 


TABLE IV 


Theoretical Trend and Cycle Series, % (x ) 











A MATHEMATICAL THEORY OF SEASONALS 


TABLE V 


Theoretical Seasonal Factors 








January .99 
February .93 
March 1.05 
April 1.02 


October 1.04 





By multiplying the data of Table IV by the seasonals of Table V, 
a theoretical series would be obtained which would comprise the ele- 
ments of trend, cycle and seasonal—lacking only chance or residual 
errors. 


In order to obtain a series of chance factors that might serve as 
residual error factors, sixty cards were marked with integers totaling 
1200. The cards were distributed, after shuffling, into twelve piles 
of five cards each, and the totais of each pile noted. These were taken 
as the residual factors for the first year, and the process was repeated 
for the following years. The chance factors of Table VI resulted. 


Making allowance for residual errors as well as the seasonal fac- 
tors, we obtain finally the theoretical series which we shall attempt 
to analyze, Table VII. 


If the various methods of analyzing time series are sound, they 
should be able to break up this series into its elementary components— 
trend, cycle, seasonal and residual errors. A comparison of the results 
by different methods should indicate to some extent their respective 
merits. In attacking the ordinary observed time series by different 
methods and comparing results the difficulty is to tell, when all has 
been done, which of the methods is best. Unfortunately, if they dis- 
agree, we do not know which one is nearest the tru... Our theoretical 
series, however, enables us to compare results obtained by different 
methods, since we know the answers in advance, and also will serve 
students as a detailed example of time series synthesis. 





ETT 
— ee 














DETROIT EDISON CO. 67 


TABLE VI 


Residual Factors 



















January 


February 106 
March 89 
April 99 
May 97 
June 94 
Juiy 103 
August 102 
September 106 
October 104 
November 97 


December 

















January 


February 89 
March 92 
April 83 
May 108 
June 113 
July 103 
August 98 
September 105 
October 107 
November 102 


December 












68 A MATHEMATICAL THEORY OF SEASONALS 


TABLE VII 


Theoretical Series 


ot hae Lae Lee Lae Loe 





DETROIT EDISON CO. 69 


To obtain the values of the seasonal factors by means of formula 
(5) and Table I we need only observe that for the theoretical series 


T, = 16061 
T, = 22153 
T, = 24407 
T,+ 7,+...Ts = 145291 
T.. = 3489 
T. = 21933 
Tie = 29930 
Consequently we have 
TABLE VIII 


Seasonals by Interpolation Method 











It is interesting to compare the seasonals of Table VIII with the 
corresponding set obtained by the method of “link relatives.” The 
following table presents the series of link relatives for the theoretical 
series of Table VII. 





A MATHEMATICAL THEORY OF SEASONALS 


TABLE IX 


Link Relatives for the Series of Table VII 


January 
February 
March 
April 
May 
June 
July 
August 


November | 1.013 
December | 1.018 


From the above we obtain the following: 





DETROIT EDISON CO. 


TABLE X 


Link Relative Seasonal Indices 


(1) (2) (3) 
Medians Chain | (2) Adjusted 


The following exhibit of the results obtained by the two methods 
is interesting. 





A MATHEMATICAL THEORY OF SEASONALS 


TABLE XI 
Comparison of Interpolation and Link Relative Methods 


Interpolation Method * Link Relative 
Actual 


Values Seasonal | Error Seasonal 


The mean deviations and the standard deviations of the two meth- 
ods show that both methods are about equally effective. This advantage 
of the interpolation method is scarcely worth mentioning. Neverthe- 
less, the fact that the results are obtained with but a trivial amount of 
labor is important. 


Mean Standard 
Deviation Deviation 
of Errors of Errors 


Interpolation Method 
Link Relative Method 





STIELTJES INTEGRALS IN MATHEMATICAL 
STATISTICS 


By 


J. SHonat 
(Jacques Chokhate) 


Introduction. Stieltjes integrals, introduced into analysis in 
1894-5!, play an increasingly important role not only in pure math- 
ematics, but also in theoretical physics and in the theory of probability. 
In mathematical statistics, however, their use, it seems, still remains 
very limited. And yet, one of the most remarkable features of 
Stieltjes integrals is that they represent, as the case may be, an integral 
proper or a sum of an finite or an infinite number of discrete aggre- 
gates. Thus the statistician is enabled to treat in a single formula a 
continuous, as well as a discontinuous distribution. This means far 
more than a mere simplification of writing. In fact, since Stieltjes 
integrals have many properties in common with Riemann and Lebesgue 
definite integrals, we can use all known resources of the theory of 
definite integrals (mean-value theorem, various inequalities), and 
therefore readily obtain general results which, otherwise, require 
special (often complicated) proofs. The advantage of such a treat- 
ment is particularly evident in the theory of interpolation, approxima- 
tion, and mechanical quadratures. 


Hence, the object of this paper is to present a general >xposition 
of the properties and applications of Stieltjes integrals. Many of the 
results stated below are well known?, and the proofs may be omitted. 
Some results are believed to be new (for example, extension of Tche- 
bycheff and Holder inequalities) and may prove useful in mathematical 
statistics. We close, as an illustration, with the theory of interpola- 
tion, for here, even in recently published books, the continuous and 


discontinuous cases are treated separately while the underlying ideas 
are identical. 


1. Stieltjes: (a) Recherches sur les fractions continues, Oeuvres, v. II, p. 402- 
559; (b) Correspondence d’Hermite et de Stieltjes, v. 11, p. 272, where these 
integrals are first mentioned in a letter (No. 351) to Hermite under date of 
October 25, 1892. 


. (a) Hobson, The. Theory of Functions of a Real Variable, 2d. ed. (1921), 


v. I, p. 506-16, 605-09; (b) O. Perron, Die Iehre von den Kettenbriichen 
(1913), p. 362-69, 











74 STIELTJES INTEGRALS 


I. Definition and general properties. Let f(x) be continuous 
and (z) be bounded monotonic non-decreasing on the finite interval 
(a,b) (a < 5) Then, as is well known, the following limits exist: 


Wx+o)= Lim | y(x+e)-¥(=x)] 


(@= x= dD) 
v(x-0)- lim | w(x-€)-#(=)] 
If x is a point of discontinuity of ¥(x), #(z+0)- w(z-0) ( >0) 
is called “saltus” of (2x) at this point. The number of such points 
is at most denumerably infinite; the points of continuity of (2) are, 
therefore, everywhere dense in (@,5) w(x) is 2- integrable, and 






sois p(x)r* (k=O,/++::: )., The Riemann-Stieltjes integral 
(of f(x) with respect to ylr)/ Ax)’ y¥(x) _) is defined as 
follows: - 

(S) 


f Ax vlx)= Lim 2 FED (x, )-w(a)] 


400” 2 


@Q@=X,< L,<X,<+++ +> <2X,.,< L,=dD 















X= é= Xi, (itO, 4, -°om-t) 


The existence of the right-hand limit can be easily established. The 
continuity of f(z) is here sufficient, but not necessary!. 













In many phases of. mathematical statistics the case of a continu- 
ous fix) is evidently the more important, although many problems 
arising in the theory of probability require applications of the discon- 
tinuous case. =, 





From the very definition (S) one may obtain many properties of 


Stieltjes integrals in common with the ordinary definite integrals. 
Thus: 


a) fanz) = K4)-ve) 







1. (a) Hohson, l-c; (b) T. Hildebrandt, On Integrals Related to and Extension 
of the Lebesque Integrals, ulletin of the American Mathematical Society (2), 
V. 24 (1918), p. 177-202; (c) Lebesque, Legon sur l'intégration, 2d ed. (1928), 
p. 252-313. 


J. SHOHAT 
[tows [fav - {fav (a<c=<b) 
fasgue = [fae 4 Shaw 

[atav =A /rav (A = Const.) 
[/ravls /\sjov 

f fay nef dy (a=€=b_ ; mean-value theorem) 
frevafiae » if f(x)sf,(x) for a=x=b 

i Efau-§ [ hay 

it 2 f(a) converges uniformly in (a , b ) 

f' fdy=f | J yof | (integration by parts) 
[tov - [Az(=)6e » if P(zp[Pz)dx+c 


with p(x)z oO in (a,b). 


& & 
(10-bis) / fdy= f. f(x)¥ z)dx , if ¥(e) exists 
é a 


and is R- integrable in (a. b). 


Let p(x) have only a finite number of points of increase in (a, D). 
(a= x,<)z, <<< <2z,(< Xno,= Dd) 
with the saltus 7, at x=x, (i27,2 n 


), so that. wl) 
remains constant = 20, for x;<2x<2,,,, and vg oO . Such. 
functions, called stepwise functions (“fonction en es ier”), prove 















76 





STIELTJES INTEGRALS 


very useful. Here 


b 
(11) |fav=2 q; fix; {9,=f(x ;+0)-f (x,-0)} 
If the number of points of increase is infinite 

(a<)x,< x,<----<x,<---, limx,=b 


(12) f fay =F 6, fle). 


Conversely, any sum F u,v, can be represented as a Stieltjes in- 
tegral in infinitely many ways. Let us introduce positive numbers 
O,, Se, °°**O% acertain interval (a, b),n points(@< ).x,<--<2x{<d) 
(the choice of 2; 0; depends upon the nature of the problem involved), 
and a stepwise function #Wxr)having at r=7; asaltus 0, (i=/,2, 
3---n). Then, writing u=0; w; , we imay consider v,; ,w,; as 
values taken respectively by some functions | (x), A(x) at x= x; 
(¢=/,2, °°: --+:+m). Hence, 


(13) 2 U,V; = [A x)Kx)d v(x) 


Formulae (11-13) show clearly the use of Stieltjes integrals for the 
representation of sums of discrete aggregates. 


(14) ffdyzo, if f(x)z0 in(a, b) 
a 


Here *=" takés place if and only if ¢(x)has a finite or denumer- 
ably infinite number of points of increase in(@, Db) (not everywhere 
dense) and f(x Wanishes at all these points, for we exclude, of course, 
functions f(x) which vanish at all points of continuity of y(2) and 
therefore vanish identically in (a, b). If y¥(z)has infinitely many 
points of increase, while f(x) vanishes it (@,B) only a finite number 
of times, without changiny sign, then [ flx)dy(x)# O — and has 
the sign of f(x) 7 . 


p 
(15) Sf@) x"“dy(x)=0o (k=0,/,-:--n-/) implies: f(x) 
a 


has at least ndistinct roots inside (a ,b) assuming that yp(2z)has at 
least n points of increase!. 





1. This is a form of a theorem due to Perron (1-c, p. 308-69). If the 


number of such points is m<n,. (15) shows only that tf (ac) vanishes at all 
such points. 





J. SHOH AT 


(16) fx*dy(x)=0 (k=0,/, -- 
2 


¥(ac) constant for(a <= 2<6)?. 


Since in the defiition (S) only the differences w(z,,,)-y(2,) enter, 
it follows that a Stieltjes integral does not change its value if we re- 
place v(xr)dby w(x)ec . More precisely: 


(17) f fe v.=[fov, 


if the two monotonic non-decreasing functions ¥,, (2) differ by an 
additive constant only at all points of continuity. Applying the mean- 
value theorem to f ao w(x)  , we conclude: 


(18) F(x) f F(t)dy(t) is continuous at all points of continuity of 


w(x) and therefore, almost everywhere in (a, b). 


, - F(x+h)-Flx) _ 
(19) [im zsh) le) F= 


(20) F (x)= f(x) ¥ ‘(at any points, in (2, ), where # (x) exists. 


One recognizes in (18-20) a generalization of the properties of 
the ordinary definite integral which is a special case, for w(x)=z. 


6 
i2l) get =/ f@&, Pdy@is continuous in t(t, = t = t,) 
a 


i f(x, t) . continuous inz. is uniformly continuous with respect 
to t(¢t,< txt ») for all values of x in (2 Bb). Moreover, 


(22) sgt) -f 225-9 ayia 


if 2 C (x, ¢ exists and is continuous in z and unifermly continuous 
intlaszr=b; %=tst,), 


1. iy Ca) has a finite number, 2. of points of increase, then ” such relations 


imply the same conclusion. 












78 STIELTJES INTEGRALS 


Notes. (i) The above results hold, with proper limitations and 
modifications, if (zr) be of bounded variation in (a,b ), for such a 
function can be represented as a difference of two monotonic non- 
decreasing functions y, ,(2) and we define in accordance with (S), 


[tev - /fav, J foe , 


(ii) In applications to probability and mathematical statistics 
¥(x) stands for the “cumulative law of distribution,” so that 


















(23) (zx) is monotonic non-decreasing from y(a)=O to y(d)=/. 


(24) For (a <¢ <d<=b) the integral fe o(x) 
e 
= probability P: [c xe | 3 fa o(z)=/. 
é 


(25) js fxd ¥ (x) =£(f) , i. e@., the expected value or math- 
a 
ematical expectation of (x). 


Let w(x) Ax) be continuous in (@, 5), and &(z) be of bounded varia- 
tion. Then, 


(26) y(x)=f wix)dé(x) is of bounded variation,’ 
a 
6 6 
[Fz ¥ (x) = [fe w(2 0a (x) 
& é 


Given an infinite sequence of functions ¥,(z) (n2=/, 2,-°°- ) 
of bounded variation in (2,3). If the total variation in (@, b) of 
all ¥, (x) does not exceed a fixed quantity Mf independent of n, and 
if, in addition, Jim y,(x)=y(r) exists for a <x < b, then? 


» ‘ 
(27) lim Sf f2rd y,(x)=f fixe v(x) . for any continuous 
fAx).” a a 


Notes. (i) (27) holds true if we know that im Plx)= p(x) 
. exists at all points of continuity of the sequence U(x): andat r=a, BD, 





1. T. Carleman Legons sur les équations intégrales singuligres noyau réel et 
symétrique (Uppsala) (1923), p. 11-12. 
2. Page 9 of preceding reference. 


J. SHOHAT 79 


(ii) In applications to probability and statistics (27) is of great 
importance. In fact, consider y,(2)as a sequence of variable laws 
of distribution approaching, as a limit, a certain fixed law of distribu- 
tion y(x). Then( by (23), the total variation of any y,(z)in@, db) 
is 1; (27) thus becomes applicable and shows that under the said con- 
ditions the expected value of any continuous function in the variable 
law of distribution approached, as noo its expected value in the 
limiting law of distribution. 


Il. Stieltjes Integrals Over an Infinite Interval. We define 


[revs free Sfev- Lim [fay 


~ 


(similarly ia ), provided the right-hand limits exist as finite nun- 
bers. It is assumed that *f dy, [ Jd exist respectively for any 
finite x>@. and for any hnite interval (a,b). For the existence of 
(28) it is necessary and sufficient that 


(29, fdy <é€ for x= a certain number z(é), 


€é>o- arbitrarily small. 


One sees readily that 


exists, if Joy does, and if £(ac) is bounded for 


-—o 
all real values of x. The first of these conditions is satistied 
if (2) isa law of distribution. We notice that any / f7y — can 


he written as Lfoy. if we agree to take ¥(zk wla), Wb) 
respectively for 2s a, 26. 


The formulae given above hold, in general, for infinite limits as 
well, with the exception of those which require a double limiting 
process, like 8, 21, 27, ete., where ordinarily additional precautions 
must be taken in the form of certain assumptions specifying the. be- 
haviour of W(2z) and of other functions involved at infinity. Thus, 
(8) is not valid in general for (a, 6)=(-@, o&) . and requires a 
more detailed discussion. Furmulae 21, 22 hold true if we assume, 
for example, the uniform boundedness and continuity with respect 
to é@of the functions involved for all x in (-@, @), and also the 
existence of fe d y (zx), i. e. definite values for y(to), 


Formula 17 deserves special attention: in general, it is not true 





80 STIELTJES INTEGRALS 
for an infinite inverval, as was shown by Stieltjes!. 


Til. dApproximate Evaluation of Stieltjes Integrals. In prac- 
tice, as in statistical computations, we evaluate / fdy approximately, 
replacing it by the right-hand member of (S), for a certain chosen n. 
The question arises regarding the error 7, of such an approximation. 
Let w(a@) represent the modulus of continuity of f (x), ie. 


(31) | A(z)-fy)| = w(6) for |z-y|<d(a< x, y< bd) 


Then, if 2,,,- x;<h in(S) for i= 0, /y++-+--n-/, we have 
raz [649 E HE) b eo) -v (20) 
2 fips eden 
($2) |7,| < o(ng [dv-wnly (b)-y(a)] 


{k= max. (24,7 23) 3 120, /, 


(32) answers the above question for any continuous Hz). 


Special Case: Lipschitz condition’: 


(33) |f(z)-f(y)|s Alx-y| (asx, y=b; 
| mals Ah[w(d)-v@@)] A= const.) 


In (32, 33) we replace h by h/2, if (€) in (S) is, as usual, the 
mid-point of the interval (x2;, 2,,,)(1=0, /,-- ++ +n-/), 


It must be noticed, however, that the above considerations are 


t. (1. c. p. 73, p. 505-06. (17) is closely related to the so-called “Moments- 
problem”: find a monotonic non-decreasing function ¥{z) in (a, d) with 
infinitely many points of increase, if all its moments 7/,=/, xd y (2) |k= 
O,/ +++] are given. This problem, for(a, b) infinite, may he “indeter- 
mined,” i. e. it may admit infinitely many solutions, while it is always “deter- 
mined,” for finite (@,b). Stieltjes gives the following example: 

["x*[/+A sin (x @J]e-** dx =f xe “** dx 
*. constant, K=0,/, ] ,and  » (ce) *0 = 
, [/+A sin x¥]e** is monotonic non-decreasing in O,@ , if |Als/. 


| &. f(x) exists for ag 2<b . then A™ can be taken equal to max. 
\f (&)| in (a,b) . If f(x) is given gravhically, A can he found roughly 
as the maximum of the absolute va" ae slope in(@, db) . 





J. SHOMAT 81 


not workable in general on an infinite interval, for here, in place of 
(31), we ordinarily have the more complicated relation 


if (x)-f(y)|< (x,y) (|x-y|< 5) 


where w (x,y, 6)+e@o with a, y (ex.: f(z) =x*). Thus here, in 
order to obtain an inequality for the error, we must add to the right 
member of (32). where a, Bb are finite numbers properly chasen, two 
more terms—the upper limits of Le fdy| and| /) td y| . which 
we obtain by means of a suitable hy pothesis conéerning the behavior 


of f(x), (zx) at infinity. 


IV. Tehebycheff and Holder Inequalities for Stieltjes Jutegrals', 
Hereafter (2) stands for a monotonic non-decreasing function 
defined on a certain interval (a, 5 ), finite or infinite. Let f(x), 
$i(x)fi=/, é,° “n| be continuous on (@,b)2. Then we have the 
following fundamental transformation: 


(34) 


ae 
2 
_ *| Alzr- Kz,)|, 
if 
| Aa) fx) b,(x,): ox is! 


The asi is very simple for a=2, for we can write 
[ulzyw(x)- [uzyvx) 
a @ , 
as [fu (a) v(x) cy(2z,)7 ¥ (x) 
a 


and it may readily be extended te any a. Formula (34) yields many 


’ lilevtaa . 


oc oie 


1. Cf. my Note: Jacques Chokhate, Sur les intégrates de Stieltjes, Comptes 
Rendus, y. 189 (1929), p. OTS-20. 


> Incase #(x) has a finite number of ports of increase in (a, 5). we require 
only definite values of all Sihz, 6, (x) at these points. 









82 STIELTJES INTEGRALS 


interesting results by a proper choice of 2, Sis $i. 
| Examples: (i) n=23; f=4,> i= = ¢, 


Jf ‘oy Jsia0- fi hf ow)*= 







(35) 
Sian f(2d | . 
MI cons andi 
. Schwarts inequality—( "=" only if f, and f, are linearly de- 
suntiint. 


(ii)  . r=23 fl=s,=/. Write J» ¢ in place of f,, ¢,: 


(36) [row fav -Srav- [ew 


sh » [Nosrtol $(2)-MylOv(Z) Ivy) 
Jae Jteave [fav-fooy 


Tchebycheff inequality (derived by him for the special case 
Jy=dx ), where f, $ are any two functions both varying mono- 
tonically in (@,6 ), cither in the same sense (sign > in (36) or in 
the opposite sense (sign < ). In (34-37 ) we may replace ‘0 y(x) by 


p(x)dx| p(z)z oO _ b))? 
(iii) F,(x)=x4% O,(x)=F(x) xt" [iz1,2, Sinai n] : 
feo vee ““. Fx*'dy 


wr ee Px”. dy 
7 Jr pus fm ‘dy 


nen It Fz) av(z) Ht. (x,-2,)*. 
1. Cf. E. Fischer, Ueber den Hadamardschen Steen. Archiv fiir 


Mathematik and Physik (3), v. 13 (1908), p. 32-49, where (34) is derived 
‘for the particular case dy (4) dx 















he , ees © 


J. SUOH AT 83 


The determinant A, plays an importance role in the theory of orthog- 
onal Tchebycheff polynomials (see helow). Formula (37) gives an 
upper limit for A: 


(38) JA, < Yap (b-ayM "| fap (a))” 
[M=max. | F(x)| in (a, 5], 


Applying (13) to the above formulae, we get: 


(39) (Xu) <pu! Sv? 


as/ avi 


—Cauchy inequality (from (34)) 


(40) n} a, b, eda, 2 b, 


avs 


(vy,. ; >o)' 


a Wi 


as ass 


Formulae 40, 41 follow from (36) by means of (13). with o,=/ (in 

(40)), oz; (in (41)) [2=42,---n]. The sequences {(2), (b,)}, 
{(u Ds (#) } are assumed to be either increasing or decreasing. the 
sign 2 being chosen as in (36). Thus all these (aud many similar) 
inequalities have the same origin-formula (34). Applying (13) to 
Holder-Minkowski inequalities’. 


(42) Zia, ale [S lat} (Pro p* (won 
{Starest} "fd lail*}*+ {3 Io? (o> 


we get: 


(43) fission s {/Istav}ey [lol ay] (8 >1) 
F 


bh Cf. doe. p. 73, Deb, pp. 142, 143, 146, 194, 


s F 


. Rietz, Ueber Systeme integrierbarer Funktionen, Mathematische .\nnalen. 
v. 69 (1911). pp. 449-497; p. 456. 










BA STIELTJES INTEGRALS 


co | fieorer rs fistew)’ { fistey pcon 





Formula (43), with gu/, s=°% >/ andf replaced byl f|* 
yields : , 


(45) { fare] tes {Jinmay { fav } 


(3,>3,>0), 











The applications of the above inequalities to the theory of probability 
and mathematical statistics are many. A few illustrations follow: 


, Consider momen e xz” = 86. 
of a ry on oe anf anh som ele be sr 


(-2,@). Here (36) gives (with, <, dp(apfizidx , 2) Axidx/). 









(46) Ugen/'= "felix <5 
[s-/, 2, Sa “—. f(z) = fx) ; 
Gi) 1£é catenins constant, taki j in (42) A ax &, 
ee e law of distribution of x over (a, 2), so that Papi! 
e get: 


v, * for 3,<s, 


(4 / =-eitay]%). Ye). 


ence, * any a over any interval the quantity 


wel x F|° dy (x)}* increases with ¢ for any constant € 
particular, 


(47) 


altel fx dy | also if a2 0, 
us| / d v)" . 


1. Paul Levy, Calcul des probabilités (Paris 1925), p. 157-58, 









J. SHOHAT aS 








(iii) Apply (36) to the functions f(x) , (2) both monotonic 
in(a.b) , (2x) the same as in (ii) : 


(48) E(fd)2 EU fE(¢) (for the choice of Z see (36))'. 







The same formula (36) gives for any function f(x) 
(49) E(f>{E(F)}" (n=2, 3,----¥ 







Formula (45) gives with the same (2x) 


(50) {2UF1°I}% = {AUAI%) } (5, < 50) 






V. Application of Stieltjes Integrals to Some Minimum-Prob- 
lems. Given a number m2/ , M finite points x42,4°+*2,,M 
positive quantities 0,, 0, ,°*:9,, and a function f(x)with well de- 
termined values f(xXi=/, 2,:°-°M). Find a polynomial P, (x), 
Por not exceeding n( = _M-Z),minimising the expression 







o%|f(z,)-P, @)|" . Discuss the bchcvior of P(x) for 
co. We introduce a finite interval (a, b), containing in its in- 
terior all points z; and a monotonic non-decreasing step-wise function 
y (x) with the above properties (saltus o, at x=x;, etc.; see p. 75). 
Then our problem can be formulated as follows: Find a polynomial 
P(x) of degree not exceeding n, 7 gy | the integral 
Jf (2)-P, (xy" dy(x) [m = /). 

Here ‘the advantage of Stieltjes integrals is clearly evident, for 
the latter problem has been discussed by G. Polya’, D. Jackson’ and 
the writer’, We know that a solution always exists and is unique 
for m>/. The behavior of P(x), when either or both m and x in 









1. G. Bohlman. Formulierung und Begriindung Zweier Hulfssatze der Mathe- 
matischen Statistik, Mathematische Annalen, v. 74 (1913), pp. 341-442; 
p. 374-75. 


2. In fact. (36) holds, with sign >, if Aix) - Fly) and g(x) - oCy) have 
the Be for any x,y ina,d, which. of course, is true for 


9 = 
3. (a) G. Polya, Sur un algorithme toujours convergent . .. Comptes Rencus, 


v. 157 (1913) p. 840-43. (b) D. Jackson, On the Conenguene of certain 
polynomial and trigonometric approximations. Transactions of the American 
Mathematical Society, v. 22 (1921), p. 158-66. (c) Idem, Note on the Con- 
vergence of Weighted Trigonometric Series, Bulletin of the American Math- 
ematical Society, v. 29 (1923), p. 259-63. (d) J. Shohat, On the Polynomial 
and Trigonometric Approximation, Mathematische Annalen, v. 103 (1929), 
p. 157-75. 


















86 STIELTJES INTEGRALS 


crease indefinitely, has also been discussed by the above writers. It 
was found that, if fix) be continuous in(a, d), then forn fixed and 
moo, P, (x) approaches uniformly in( a,b) the polynomial 1, (2), 
of degree. =n, of the best approximation (in Tchebycheff sense!) to 
f(x) provided, y (x) has infinitely many points of iicrease every- 
where dense in(@,b). Furthermore, [ i nee xy" Wa] & 
the best approximation E,{ f )=max Af x)-1q(x)ifor asxzab. 


This result has been supplemented by the writer (in a paper which 
will appear elsewhere), who showed that the above result holds if 
¥ (ax) has a finite numberM(= n+2 ) points of increase, 1T,(2)repre- 
senting here the polynomial (of degree = n) giving the best approx- 
imation to f(x) on the agyreyate of the said points of increase of 
y(z). The following cases are of special interest. 


(a) n=O ,i.c. find a constant X,, minimizing the sum 


Y ol Az)-Xm\™ - 


dus 


Very simple considerations show that the best approximation to 
{ f(z) p(get, 2. --** )by means of a constant is Ef )=414G) 
-flz.)|, f(z), f=) being respectively the largest and the small- 
est of the f @;).so that | f (z,)- f (@,)| is the largest possible, 


and the “constant of the best approximation” is N,='% [. f + fl=,}. 
Thus here 


Lim x, = Leaf) 


m~@ 


GN dim {FoF @)-X]"}™ 


M--c ie 


fe f(=)|. max. | Aaa Aen: j 2/, 2,+--n) 


flz,)< f (=z) Sa *e« < f (z,) implies. 


- flew)+f(z,) 
2 


dim 2, = 
Lim {§ | fl2)-X|"}/" = Legte) 


1. That is: £,(f)2 max. | f(x)-7,(x)F max.| f(x)-G6,(x)| (asxed) 
where G, (x) is an arbitrary polynomial of degree = 7 , equality implying 
necessarily: G,,a J/,,. 








J. SHOHAT &7 
and the limiting results do not depend on 2,,2,,°-2,, As an illustra- 
tion /(x)=x***’ may serve, or, more generally, /(x) z, A, x %*! 
(all A; >O ; all A, and A are positive integers or zero)! 


(b) Mz=ne2 ,n arbitrary. Here the writer showed (the paper 
will appear elsewhere) : 


(53) Zim: P,(x}I,(z) 


= Lent loon f mp CK, Achs (2 jezd- 2x?) 


($4) lim, [2 — "| “= B,(f) 


* We herd 


eae for fes (xJ-x),) 


we ke/, 2, M 
Siz fies) (i= 1, ey --M) 





where A, 1, ; stand respectively for the following determinant and 
its minors: 


2 3 ;3 n 
| zR- xy X}-xp xP- x) 
X,- Xs X,-2X; X,- Xs 

! 2 2 a 
| at. ae 
t.- X 2.- % 
(55) K= 5 
. 2 & a a 
|S - ~ 2. arn 

-=F. 


n+2 


Zn~ Xnez 


We proceed now to show the application of Stieltjes integrals to 
interpolation. This must be preceded by a discussion of 


VI. Orthogonal Tchebycheff Polynomials. 


Theorem. Any 


1. Cf. D. Jackson, Note on the Median of a Set of Numbers, Bulletin of the 


American Mathematical Society, v. 22 (1920), p. 160-64, where the above 
results have been obtained for the particular case f (x)=. 




















STIELTJES INTEGRALS 


function Wx), monotonic non- -decreasing on(2, ) — finite or infinite, 
and haviny all moments 7, =f “x Uy(x) (k=0,/,-°+) with y =O 
generates a sequence of polynomials {%, (x) of degree n=o, /,: 
uniquely determined by the relations! : I? J, ¢ y= O. 

(man 3.m, n=O, /, 


oa & d k=0, /, = } 
ae, ee to [°x*$(@)an(ayo ( re!) 











Proof. Take ¢,(x)=x¥f,., x°#’+:-fx+f,. The above relations 
lead to the following set of onan 

Ls +f, 7, + a ‘he Na-it Fy =O 
(56) fo 0 ’ - % meee 2s f 3 "San seailiieai 







f Lh, anne 4 f,. ‘ os +%.20 
The determinant al of the coefficients 7, is (see (37)): 


(57) Aveo ;- Sf flava) Marz)" >0 
times) 


i,jel, 













which ees our statement. Add to (56) the identical relation 






Ltfxrr: + fi, x"4(xtg,)=0 , and for ¢, (a) we obtain the 
following expression : 

%%°° ' N, Bw °-Denw 
(58) V Ye - + Taw YX Ie -%a 


y % : "Tene 
Note. If w(x) has in (a,b ) only a finite number © of 
points of increase, thenA,=0 for n>M, and ¢,(z) exists only 


for n=O, /,°**M (See below (65), which in this case is a rational 
fraction). 









The following table gives the most known and important Tcheby- 
cheff polynomials. 







1. We disregard constant factors. 


J. SHOHAT 
fe oe oe 


(x- ora dx\| Jacobi: 
Finite (ya >0) | (x-eite-af “Tel al (b-xJ 
x*e = dx Laguerre : g” 
(%, KX >o) x ““e “- a [ze 


an ner Hermite: 
fe “ax(kooy emsda(e sy 


The i: aiiatiile b (a) can be normalized, - multiplying by 
constant factors @,=/: [ $2(x)dy, so as to obtain an orthogonal 
and normal system of Tchebycheff polynomials { $,(x)=4, 2 ---+- } 
@r=0O, /, 3; @,>0) 


(59) i (x), (x)dy = O(m#n), =1(m=n) 
(m, n*O, 1,:°2 -+-++++0) 


The following are some of the most important properties of ¢, (x). 


(0, @) 


(a) The roots of ¢,{.r) are real, distinct and lie between a, b. 


6 . 
(b) If all integrals J=S (=z) exist (n=O, /,°**) then, 
by (59,) we have the formal development: _ 


(0) f(z)eo FA a2) [Ar=/a9(xev(~)]" 


which, regardless of its convergence or divergence, has the following 
remarkable property,; any “section” (“Abschnitt”) of (60), i. e. the 
polynomial fn a 6, Ai $; (x), obtained by taking its first n+/ terms 
(n= 0, /,- } qive s the best approximation to f(x) in (o,b), 
im, the sense of least squares, i. e¢. it minimises the integral 
f| f(x}+P, (aj’ dy (x). Moreover 


'. Cf. W. Romanowsky, Sur quelques classes nouvelles des polynomes ortho- 
gonaux, Comptes Rendus, v. 188 (1929), p. 1023-25, where new polynomials 
are discussed arising from Pearson's frequency curves of type IV, V, VI. 

2. 'n the ne, f(x) oP, A,$,(z)  ,wherethe $;(2x) are not 
normalized, Acs[¥ ody: {# dy. 









STIELTJES INTEGRALS 


(61) SF] P] ‘dy = min. Se G, - dy (zx) 
-/s* op-2 At 







C,(x)2, &, x‘ denoting hereafter an arbitrary polynomial of 
degree s x. The proof is very simple. Write G,(=) a6 Lu ¢ (2) 
with constant coefficient H; , substitute this expression into [=f -GIdy, 
and write down the conditions of minima: % of = O, which, by (59), 
lead to 










H,=/ $6, dyA; the Gi todye ++ eh, 


These coefficients A; can be written down as linear combinations of 
the moments 


(62) m= f fe)x*Jy(e) (k=O, /, +++: 


Introduce the symbol 






(63) w(G,) ‘- ™; Bi 


(6, (2)2 8x"; n=0, f. a > bi arbitrary) 


Then evidently, 
An=/ fbn IY = w (,) 
d 


flx)e z & Gr) B_(X) 3 





(64) 







in other words, we have the following simple rule: Jn the expression 
of 9, (ax) replace cach power x “by the corresponding moment m, 
given in (62)(K= O,/,++n), and we obtain the coefficient A, in (60) 
( n = O, / > ri 7 55 ). 


(c) $$, (a) are denominators of the successive convergents to - 
the continued fraction 





J. SHOHAT 


dyy) A,!| Al _ 
(65) xr-y Ix-c, Ix-c, 
(A;(>0), cy - const.). 


Historically, it was the aforesaid minimum property which has 
lead Tchebycheff to the discovery and investigation of the general class 
or orthogonal polynomials corresponding to any monotonic non- 
decreasing function, while before, only isolated special cases of such 
polynomials have been known (polynomials of Legendre, Jacobi, 
Laguerre, Laplace, Hermite). Tchebycheff found these polynomials 
in connection with 


VII. Least-squares Interpolation. Tie problem can be formu- 
lated with Tchebycheff! as follows: Given the values of a certain 
function y=F(x) at n+/_ rcal, distinct points 2x,.2%q,°**Zny, , With 
the corresponding weights o, *. Find its value at xeX , assuming 
for y the representation @+ bx+cx*+---+hx™, (men) so 
that the errors of F(x .;)[i=/, 2, °:+ nt/] shali have the least 
possible influence on the required value F(z). 


Using Stieltjes integrals (which greatly simplifies Tchebycheff’s 
analysis), we are lead to the following solution: 


F(X)=P(X)S. A, ¢, (2) 
[A= / Fe ¢, (xay(x)f"o, Fiz 6, @d]. 


where Yx )is the stepwise function having at x=2; 4 saltus Ofias, 
2,*+ nt/) (a. Dcontains in its interior all points 2x; , { Gn (2} are 
orthogonal and normal polynomials determined by (59), or, which is 
the samc, denominators of the successive convergents to the continued 
fraction (65) (we disregard constant factors), which here reduces to 
nel , 

(67) on. = Av! ~ A. | 
x-2%; |x-c, |x-c, 


(66) 


iz/ 


1. Tchehycheff, (a) Sur les fractions continues, Journal des Mathématiques, (2), 
v. III (1858), p. 289-323; (hb) On the least-squares interpolation, Collected 
Papers, v. I, p. 473-98; (c) On interpolation with equidistant ordinates, ihid., 
v. II, p. 219-42 (b, c, in Russian). 


2. oj is inversely proportional to the mean-square error of F(22;). 















92 





STIELTJES INTEGRALS 


We see that (66) is nothing but the first m+/ terms of the development 
(60). Hence,Tchebecheff's solution (66) yiclds the minimum of 


L [Aa+P, (xf d yz)= =~ "o;[ F (x,)-P, (x)]* . Moreover, for the mean- 


square error of (66), we get, by (59): 


R*=/'F *dy -¥' A; 
a kro 
= = So, F 4x) -) {de F @;) ¢, (x) } 


; ari k:0 iz/ 

The name “least-squares interpolation” is thus fully justified, and we 
see the complete identity between the two problems: least-squares in- 
terpolation and approximate representation of functions by serics of 
Tchebycheff polynomials. Whether the data are discrete and in a finite 
number, or the form a continuous set, the underlying principles and 
the resulting formulac are identical, provided we use Stieltjes integrals. 
There is no need to treat the two cases separately (as one finds even 
in recent books on this subject) and to introduce special symbols in 
the first case. .Another very important feature of the above solution 
has been indicated by Tchebycheff: If we add one more term to the 
expression @2+52+--- +hx” assumed for y=F(z), we need only 
add one more term to J (x) above, without changing the preceding 
ones (compare with Lagrange interpolation formula!) Formula (68) 
enables one to find the number of terms necessary to attain a prescribed 
accuracy. . 


(68) 


Consider two special cases. 


(a) The ordinates are equidistant :2,,,-2,;=h (1=1,2,--- +2) 
and all weights o, are equal (= /) . Here Tchebycheff (1-c. 1-b. 
p. 91) gives very simple expressions for the polynomials ¢, (x), as 
well as for the coefficients A, of (66): 


$2 AY (z+ ASA) x4 258)» - (zs SoBAg. BEN, B3) 
(69) - -(g- AtEk-1)| k=0, /,2,°°+32= efizsze, 
= k * difference. 
(70) u(z5A2z)= DM D.(2): WRT abt RINu; $,(z) 


§ f i(i+/) (n-iXn-i-/]) rz 
int 21 n22 ey [2 1B Au; $, (2) ++ 


L tows F(x) ] 








J. SHOHAT 93 


(We have replaced +/ in our above formulae by x). All $, (2) 
can be easily computed by means of the relations: 


$, (z)=O°/=/ 3 $(z)=2z2, 
(71) (z= ACA-)$, (2)-(k-)* |n*-(k-/)*] 8,2 (2) 
(A = 2) 
(b) m=/, x; arbitrary (i=/, 2,-+++n). We take in (67) 
(72) a=fov(er$ a. 


We get now (by successive division, for ex.) 


z 


fay 


(73) ¢, 
dy 


% 
7 - , wt 
(% -/x*dy (@)= 0; 23 ) 


9, (2) Vx 


d (x)= ,~C, - % x-% 
eo Ge~-c,)* dy y vo 1-7" 


P (x)=A, $, (x)+A, $, (x) 


n 2 

- UY; Ti ¥; (% Xi-Y, 

=a St = 
Z . - Y. ¥,-7," (% x-7%) 


[ y= P (3) 
A? (mean-square error) = s Co; [F (x,)-F,, (x)] . 
(76) = Le )e () VI 7 
(See 68) 


Let Ax)represent a law of distribution. Then, 7%=/, 7? H-F77= 
standard deviation @ and the above formulae become: 











STIELTJES INTEGRALS 


(77) .(x)=/, $¢(a¥= 
b 
a/Fardy 
(78) Ria)= 2 
x*dy 


x) o. x; Yy, 
= ose [(y=F (zy | 


o, x; 


(79) Rs [Fav /P6, dy) 
: ye y rhe FM, mH)" 














One recognizes in (78, 79) formulae quite similar to those for the 
line of regression of y on x and for the standard error of estimate 
of y. Introduce 


s_ ° 2 ‘ e. : 2 
os: [xidys oy Ly dy, 
(80) rydy 
D td 


‘ xz Jd v ° yd v 
and our formulae become the classical ones: 


O, Y% 
(81) P(z)=r = x: R=o,(/-r*)%. 


We thus obtained, using Stieltjes integrals, elegant, simple and easily 
memorizable formulae for a, 0, and for th: coefficient of correlation x. 
Moreov:r, we see by inspection (Schwartz inequality) that-/< r </, 
equality attainable 1f and only, if x and y are linearly dependent. We 
see also that the theory of linear regression is but a very special case 
of the general theory — duc to Tchebccheff — of least-squares 
inter polation. 
1. Cf: D. Jackson. The Elementary Geometry of Function Space, American 
Mathematical Monthly, v. 31 (1924), p. 461-71. 


Paris (France). 


SIMULTANEOUS TREATMENT OF DISCRETE AND 
CONTINUOUS PROBABILITY BY USE 
OF STIELTJES INTEGRALS 


By 


Witt1am DoweLt BATEN 


The object of this paper is to present several theorems pertaining 
to the probability that certain functions lie within certain intervals. 
The first theorem is a generalization of Markoff’s Lemma, which is 
proven for the discrete and continuous cases by use of the accumulative 
frequency function and Stieltjes integrals. Tchebycheff’s Theorem is 
obtained as a corollary to a very general theorem, the proof of which is 
based upon the first theorem. Other corollaries are given. 


Three theorems, due to Guldberg, which follow are concerned with 
the probability that a non-negative chance variable be less than certain 
functions of the expected value of the variable. These are proved for 
the discrete and continuous cases by employing accumulative frequency 
functions and Stieltjes integrals. This is the first time, as far as the 
writer knows, the discrete and continuous cases for these theorems 
have been included in a single proof. 


Theorem 1. If A denotes the expected value of the non-negative 
variable x and ¢ is any number greater than 1, then the probability 
that x < A?¢7’is greater than 1-Aa, 

Proof: If x isa discrete variable with values at x, (7= /, 2, 

- +, m) with corresponding probabilities p; , then it is understood 
that the probability that a takes other values is zero. If z is a con- 
tinuous variable having a probability function defined over the interval 
(a, 5 ), then it is understood that the probability that x lies outside 
of (@, b) is zero in case (2, 5 ) is different from (- o ,+@ ). In 
both cases x is a continuous variable in the interval (-o ,+a@ ). Let 
the probability that x lies in the interval (-o , x) be F(x), with 
F (-c )=O and F (+a )=/ . Then the probability that = lies in 
the interval ( 2x,, x,) is 


Fr)-Flx +f {F (x,+ 0)-F (x, - Of (z,40)-F (x; 0)} 











96 PROBABILIT’’ BY USE OF STIELTJES INTEGRALS 


where the last two — are came from zero when there is gt 
ability different from zero at x, and x. ‘This exists since F (2; 
a non-decr: asing ! a nips over the interval (-~,+@ ). Inthe ane 
case when # (x) = /) “f(x)odx where f(x) is summable, f (x)dr 
represents the probability that a lies in the interval (2, 2+d2). 















In either case, by definition 


A= [x-dF (x) 
-@ 
x >Af® inthe interval (At*+ €,@), where ¢ approaches 0, hence 


A> Sf: x: OF (x) 
at%so 
But 
f xd Fx)= Lim [dhe lim z . [aRe)= Lim 2 lim farz) 


at*+o At*s6 












by the first theorem of the mean, which holds for Stieltjes integrals in 
this case. Here z > At*+e  , hence zim z, 2At* , therefore 


A>At Jo (x) But £ dF(x) 1s the probability P that x 
is greater that Ad * hence 












A>At?P, Q>/-hz 


where @Q is the probability that x =< Ad *. 





This theorem is a generalization of Markoff’s Lemma’, which he 
proved for the discrete case. The above proof takes care of the dis- 
crete case, the continuous case and the case which is a combination 
of the discrete and continuous. 









Theorem 2. lf f ( 2,, 22, °° ° *2,) is a function of n inde- 
pendent variables, then the probability that 


| f-k|s t/EU)-2kE(f)+k? 











1. “Wahrscheinlichkeitsrechnung,” by Markoff. 1912. Page 54. 


W. D. BATEN 97 


is greater than /- Y42; where Z represents the expected value,& 
is aconstantand ¢>/ . 


Proof: Let 
y=(F (25225 ° 2q A] then 
L(y )E (f \- 2k Elf tk ? 

By theorem 1 the probability that 


|f- ks ¢ YE(F9-2KEY ok * 


is greater than /- Mye. 


Corollary: If f(2,, 2%, * + **2n) * Z+2gt--+ x, and 
ks 2 (z;) , then theorem 2 becomes the famous Tchebycheff theo- 
rem'. This theorem is: If 2, 2, - +: x, be independent 
variables, then the probability that 


|Z 2,-¥ £ (xis eZ E@N-2F £0, =f E(x) }* , 


adj 


is greater than /- } 
This proof is by far simpler than that given by Tchebycheff, while 
it is similar to that given by Markoff. 


In the corollary if k SE (x;) , L(z)=aF(z,*)*A 
then the probability that *” 


236 . = 7. 2 
is greater than /- 4s . 


Ms . 22): az then the probability that 


pp =,'~ k| st 2 Ass t2h 945 a4. 2k 93,9 +k? 


is greater than /- Yee , where the variables are independent, 
£(x"\*A,2. ; Elxf)=4,, - If $ is negative it is under- 


1. “Des Valeurs Moyennes,” by Tchebecheff, Journal de Math. 1867 (2). Vol. 12. 










98 PROBABILITY BY USE OF STIELTJES INTEGRALS 


stood that 2 can not take on the value zero, if s=4/, , where @ 
is odd and } is even, it is understood that 2 is non-negative. 


Other interesting results may be obtained from this theorem if 
Fit... ° * * * * *) represents various functions of the n 
independent variables and & be given different values. If f is the 
sum of the variables, theorem 2 is a more general theorem than Tcheby- 
cheff’s theorem because of the constant k which may have values 
other than 2 EB (x,) . 










Let 2,be the result of an individual throw of a coin, x; *1 if a head 
is thrown and 2; = 0 if a tail is thrown; then 2(xfpp- 14¢q- 0, 
where pis the probability of a head and @ is the probability 
of a tail. Let m represent the number of heads thrown in » throws 
and let k=nptfn-npq_, then the probability that 


| m-(nps yn- mpq)is t/7n, or that 






|-( pt ff-22}< 6 , is greater than /-4a , 
Let ¢=*/7% , then the probability that 
+ /L- e <4 s 2. / ! 2g 
~ apt n =~ n PStRAn Tn 


-is greater than /- ya or 1-4 , Which approaches unity as n 
increases. It is near unity for large values of n. This shows that 
the empirical probability approaches the true probability p as the 
number of throws increases, and the advantage of k . 









Theorem 3: Let u,,. x be the expected value of the non-negative 
variable x raised to the power 2 and ¢ any number greater than 1, 
then the probability that 2< ¢ nui. 2is greater than /- Yan ° 


Proof: Let ¢ > “Jfai.g, and let F(x) be the probability that 
z lies in the interval -@ , z ), then by definition 


a =f x*dF(2z); and Mas =f 2"dF(xye" 


W. D. BATEN 


Has. /x°dF(2ye" Lim f=”ae xy" 


e700 


n @ “g 
=lim (ze) - lim J GF(z) >i SfoP (2), 
c ero vw, ee 
by the first theorem of the mean, and since das m ‘el 21. 
Since L od F (x) is the probability P that x is greater than .c, 
on 


Suze >p or Q > /- Ans 
e* Cc 


where @ is the probability that z <c. 


c i 
t= ; =a. 
Let a - then rr , hence 


Q>/- Yn . 


But Q becomes the probability that z< “/@7,, , since c was 
any number greater than "/a7%... . 


Let y|x-A|, then theorem 3 becomes: If ui, is the 
expected value of |x-|" and ¢ is greater than 1, then the probability 
that |z — | does not surpass the multiple ¢ Vai. , is greater than 
/- % ”, where & is a constant. 


If k= [x0 F(x) »then w7., becomes wu, and theorem 
3 states that the difference |xz-| does not surpass the multiple 
¢“/a7.~, is greater than /- 47. In this special case theorem 3 
becomes Guldberg’s theorem’, but this is more general than his theorem, 
for it includes the continuous case, the discrete case and the case which 
is a combination of the discrete and continuous. 


lf y =f (x)-k | is used for the variable, a more general theo- 
rem is obtained. Here (2) is a function of «. Of course, the 
probability law for £ (2) must be secure from that of 2 if the con- 
tinuous case is under consideration. Certain restrictions must be placed 
upon f (2c) concerning continuity, summability and concerning the 


inverse. 


_- 


1. “Sur un théortmede M. Markoff,” by Alf. Guldberg. Compte Rendue, Vol. 
175. (1922) page 679. 





mM PROBABILITY BY USE OF STIELTJES INTEGRALS 


Theorem 4. The probability that the difference | -m]| is not 
greater than the multiple ¢u,., ¢>/, is greater than /- (7/y fee. a 
{ 442), where w,.~ is the expected value of |x-m|"%, and 2 is the 
expected value of z. 


Theorem 5. The probability that the positive quantity 2 does not 
” 


surpass the multiple ¢m , ( ¢ >/), is greater than /-( = 2)" cor ; 


where &y,2 is the expected value of j2-m|" and me: Z(z) . 


These last two theorems are due to Guldberg' for the discrete 
case. By the method used in theorem 3 these can be proven for the 
continuous case, the discrete case, and the case which is a combination 
of the discrete and continvous. 


1. “Sur quelques inéqualités des le cz'cul de probabilités,” by Guldberg. Comp. 
Rend. Vol. 175 (1922), p. 1382. 

“Sur le théor’me de, Tchebecheff,” by Guldberg. Comptes Rendue, Vol. 175 
(1922), p. 418. 





EDITORIAL 


FUNDAMENTALS OF THE THEORY OF SAMPLING 


I. SAMPLING FROM A LIMITED SUPPLY 


We shall consider first a population of s individuals, in which 
each individual possesses a common attribute that can be measured 
quantitatively. The sum of the associated variates may be expressed 
az follows: 


s 
Dj,+%,+ L,+---- Lt =P x+=5M, 


From this so-called parent population it is possible to select ( 2) 
different samples, each consisting of r individuals, (r<s ). These 
samples may be ordered after any fashion, and the algebraic sum of 
the variates for the respective samples may be designated 

Fis 
2,2 2 +2, +Zy + - *+%,= Dx 


z.:* % +2, mm °-:* 4+2,,,= $2 


Fia)~ 25-541 tere | a 7% =) x’ 


Thus, while fz represents the sum of all the g variates in the 
parent population, s x designates the sum of the r variates occur- 
ring in the 7th sample. : 


We face now the problem of describing adequately, from a sta- 
tistical point of view, the distribution of these ( S ) values of z , that 
is to say, we must express the moments s4,,, in terms of the moments 
of the parent population, sc,., . 


By definition M, = < 
P 











102 FUNDAMENTALS OF SAMPLING 


Since each value of z will contribute » terms to the value of Jz, 
this latter expression will consist of 7-( %) terms involving each of 
the s variates of the parent population alike. Therefore, each variate, 
x,(2= 1,2,3, . . . 8 ), will occur in the expression for J 
exactly - .( %) times. Consequently 


z2ilr r ¥ 
(1) M,= @) = (8) > (2) (x, + 2+ -- ° x,} = 3 27= 
We shall now investigate the values of 


ae 
Pais $) 


where we choose to represent a deviation from the mean as 


¥,-2,-M, 
Observing that 
#,= 2,-M,= 2+2,+--2,-P7PM,=Z+i 5°: -*-&, 
we note that 
E°= D}z* + 23 3,3, 
zs= S27+ 2$°32,3, 
te fey oie s 
Zey Lt + £2 2;x, 
Therefore 





- as. ft P+(%) . 
hast (3) al 3 Sz +2 ——— 2235}, 


or, writing | 
_ 28 rip-Mrn8)- = Ar) 
ee en 5 (s-/\(s - 2) (s-1-7) ° 


(2a) Mesy= 2 {2, 42" 4 2, 2H | 





EDITORIAL 103 


By utilizing further the multinomial theorem, it follows easily that 


s 
re x, 4; 
(3a) Es:2 = 3! {- 38 +, gibt tA, 25.5 


= 6 A= 3g 


Pt73,2, S#EZzEZ | 
£145 Me 


+A. Sine + As V4) 


etc. 


The rule for writing down the terms is as iollows: ‘The numive. 
of terms in the expression for 4<,., equals the number of partitions 
that can be formed from the integer 2. The subscript of o equals 
the number of elements in the corresponding partition, and exponents 
of X and the factorials in the denominators are in fact the elements 
of the partitions. 


Our next problem is to express the summations in terms of mo- 
ments of the parent population, 4, _. 


First order summation 


LF 5y,.=0 


Second order summations 


Dz *=Sh, x 
22%, %=-SH,. 


since (iz)*=0=) £44 2y3, z, 


Third order summations 



















104 FUNDAMENTALS OF SAMPLING 


sine Jt? S2=0-S3%s Saiz 
s a 
and (Jz)*-0=S 5+ 5S 5'z,+6 2 5, z, %, 
Fourth order summations 
= SHy-x 
“SH, ~ 
a 2 # =; = ~Sf,..7+ a 
22 2f x; a x LS be, 2 s“u oe 


~& Sh,y._+S “ue ve 





Utilizing these summations, (2a), (3a) and (4a) may be written 
(2) ae* Steal 7-72} 
(3) Mee =Shs2[2-32+24} 

(4) ys Sie 9-77,+/24-6,4} +3s*use-24 +4} ‘ 


Continuing after this fashion, one can show after a lavish use of 
symmetric functions that 





(3) Mas ™ Stax {4 -/54,+ 50A-60p, +24,) 
+ 10.37 [bya Maz {P2-44t Se ea &Ps } s 

9) Mee” SHex{ 7 -I1A +1807, - 3902, +3607, - /20p,) 

+/5 8*Uyg~ Ui.0{A2- BP, +/92, -18f, +6} 

‘ies ipe acl linet 


+ 1$.3°23.2{ A- I, +5 fs - Ps } . 


EDITORIAL , 
(7) ys SH,.2| 2-657,+ 6022, - 2/004, + 3360p, 


- 25207, + 720,,| 


+ 2/3" [he.2}He.e\ 2 -/6y* 652,- MOP, 
+ 84, - 24, | 


+ 36 Sp, Hse|/2/9* 354, - Str, Fin, -lép,} 
- 105 5 ths.2 | 2.a\/s- fe +30, -7, +2, » 

(8) Mag= Séezl4-/?A+/9322, -10L06/, 
+252002,- 3/9202, + 20/60 p, - 5040p, | 


+263, Mae lan 322, + 2/1a,-570/, 


+750,2,-480p, +1202, } 


+ 56 5*Us.¢ Ua.c\ 2-844 974-240 2, +3040, 
- 192 ,+ FB fp, } 
+ 35s*u2,) 2-/40,+752,-1802,+228,2-144+36 3 
+ 210 $7Khy.2Mia\Po- Ya * 21g 312, +24 2,- Cp} 
+ 280 3°13 5 Lael o- 7% +! Yeg-25 ht 162, 42g 


+ 1058"S,.@\A- Yat ©2,- 44+ |- 


It is convenient, at this point, to define the “ th sampling poly- 
nomial” as follows: 


(9) B(-)-D," lob (24741 -/)). 













FUNDAMENTALS OF SAMPLING 


If we place y = Jog (re *+/-) , then 






x 
w Pe 


=z, be (1, 2° 
i * (pe %/-p) y= pe 


y 







Taking the 7 th derivative of both sides by utilizing Leibnitz’ 
Theorem, we obtain 


















(pe %/-p) 74 7) ce *y "(ee ra, ‘++ = Ye - 
Placing z = 0 in this equation yields, by definition, 
P(e)? ) oP, (eZ) PP,., (y)+ al a 


That is, for m=O, /,-2,°°°°- 


Bl(p)=7 
R2)Fp)=2 
P(-)+274(r¢ ef) =? 


Ble)+3oR (e5R(AtoP(p)=2 


etc. 


Thus: 
Zr) =" 
R(A=?-" 

Bir) =9- 39% 2p? 

BV=7-70*+1203-604 

A{YA= -/5,07+50 2*-60+ 24 p*4 

Ble) =p-3,7+ 180 2%-390,0%+3690%-120 04 

Rio) = 2- 632*4602 29- 2/00 2°+ 3360, §-25202'+720p7 
Ble) - 2-/2707+/93209-/02060 *s 252000 *-3/9 200 ¢ 


[+ 20/602 50409) 











(10) 


EDITORIAL 107 


The law of formation of the coefficients is obvious: for if Cyn 
designates the coefficient of o* in the expression for BGa) , 


Cin I ai” ( Coca 


Comparing the polynomials of equations (9) with formulae (2) 
to (8) inclusive, suggests writing the expressions for ./,,, in the 
following symbolic form: 


Shae Hae, Py S*Uy« 
4!2/ Rk! (3/* 


+ 


PP s 9 1h 5.2 } 
3! (2/)? 


By 4, we understand an expression derived from the sampling 
polynomial, P,() , by writing 2* as ;. Thus, 


Bi) = -70*+/Z-62% , whereas 
A *** TR, +/2/p,- oY. 
Again, since 
Pl):Pl) Bly)= (0-574 "0-7 = 2"- Jo *+ee%, 
BP," = 4-5P,+2Ps 


The number of terms in the expression for 4,4,, will equal the 
number of partitions that can be formed from the integer 9. The 
subscripts of the P and yw factors for any selected term correspond 
to the elements of the corresponding partition, and the exponent of s 
equals the number of elements in the partition. The factorials beneath 
the 44 factors agree with the order of these moments, and the fac- 
torials appearing occasionally under the P factors depend upon the 











108 FUNDAMENTALS OF SAMPLING 


number of times that any P is repeated as a factor in that term. All 
terms arising from a partition in which unity is an element have been 
neglected, since such terms will contain y,,, as a factor and conse- 
quently be equal to zero. 


Tilustration I. For the parent population we shall select the fol- 
lowing (it will be noted that graphically the ordinates terminate on the 
hypotenuse of.an isosceles right triangle) : 


TABLE I 


Parent Population 





The mean, standard deviation and moments about the mean for 
this distribution are as follows: 


M, = 8.666 

Kaus 33.222 Ox = 5.76387 
Me:x= 108.526 %,-=  .506749 
Me:x = 2642.27 O.2¢= 2.39398 
Me:x= 20525.2 %..= 5.69279 
Me: x = 322570 a 27.3878 


It may well be remarked at this point that the stanaard variate 
corresponding to an observed variate, 2, is 
“ z,-M, be =; 
(12) t, = =" oe 











EDITORIAL 109 


and is consequently an abstract number. The m th moment of the 
standard variates is also without unit, i. e. 


a” / ne - 
13 @,.= - = z = Hox 
(19) wz “WN Noz 2%, o: 





In dealing with distributions one should always bear in mind that 
the mean and standard deviation determine merely the position of the 
centroid vertical and the scale of the distribution, but that the standard 
moments are influenced by the shape of the distribution alone. Con- 
sequently a study of the mathematical representation of frequency dis- 
tributions is essentially an investigation concerning the standard mo- 
ments of observed and theoretical distributions. 


From the above parent population it would be possible to select 
ed, samples, each consisting of 25 individuals. To describe the 
Gistribution of these sampes, we proceed us follows: 


Pp, * = .08333 
Pe * Ps" 509. = _ 8896 32 


Pe™ Pa Faq = 0005 1626 226 


22 
Pa Pa Seq = 0000 3824 1649 
P= Pao = 0000 0271 3090 0 


Po Pax = 0000 0018 3938 31 


P, = 0766 4437 0 P, = -.0450 5692 
P, = .0642 9896 8 P, = .0032 3772 
P, = 0424 7628 8 P2= 0040 5670 
P? = 0056 9468 03 P= .0004 0949 
P, = .0065 8261 36 


Rea” 











110 FUNDAMENTALS OF SAMPLING 


M, = 216.66 

Mes= 763.88 Oz = 27.6385 
My,= 2093.43 W,= 0991550 
Msg= 1730700 Gas = 2.96594 
Msg= 15647600 Os.g= 970225 
He:_ = 6503500000 O,.. = 14.5900 


As a check on this theory, three hundred Hollerith cards were 
punched with numbers corresponding to the three hundred variates of 
the parent population. The cards were thoroughly shuffled and then 
placed in a tabulating machine. After twenty-five cards had run 
through this electric tabulator, their total was recorded. By repeating 
this procedure one thousand samples were readily ubtained and the 
results are presented below. 


TABLE II 


Distribution of the Totals of Samples of Twenty-five Variates 
Selected at Random from the Parent Population of Table I 





In this observed distribution it is found that 


M = 215.84 o = 30.8505 
Os = .1556 56 Hs = 1.39471 
%. = 3.18939 %._ = 15.8603 





EDITORIAL 111 


The significance of the differences that exist between these func- 
tions and the values of ,, o and &,,, given above will be consid- 
ered in a subsequent paper. 


The unmodified moments, v , for the preceding observed distribu-. 
tion were corrected for grouping by means of the following formula: 


ve(2)4 fis Veet G ede 2s) Ven 
(14) j- L 3/- 64 f 
= (25 k )¢ a5 i Wr e* soe 


where A represents the numberof different equidistant variates that 
can appear in each class. In our case, X = 20. Sheppard’s corrections 
will appear as a special case of this formula by permitting A to ap- 
proac infinity. Thus 


(15) g= Var (D734 n-2 +A) soe a-+ - (8) 734g ost" 


At first thought one is apt to be surprised in observing that the 
distribution of samples appearing in Table II is so nearly “normal,” 
whereas the samples were taken from a right-triangular parent popu- 
lation. As an even more extreme case, I may mention that a group of 
students chose arbitrarily the following most unusual distribution for 
a parent population : 


TABLE III 


*Compare with formulae (2b), page 94, Handbook of Mathematical Statistics. 











112 FUNDAMENTALS OF SAMPLING 


and found that the distribution of the totals of 1000 samples of twenty- 
five variates each was as follows: 


TABLE IV 





As a matter of fact, if 7 is fifty or greater and $ is at least ten 
times as large as 7 , the parent population has relatively little control 
over the shape of the distribution of samples. But before investigat- 
ing the limit towards which distributions of samples approach in shape, 
it is well to present a second i'lustration of the theory so far developed. 


Illustration II, Pearson’s Hypergeometric Series. 


If from a bag containing ¢gs black and ps white balls, 7 balls 
' are withdrawn without replacements, the chances that the 7 balls 
withdrawn will contain 0, 1,2, . . .,2, . . . »” white balls are 
given by the successive terms of the hypergeometric series 


16) BICEP) 


A distribution of this type is equivalent to the simplest case that 
can arise in accordance with the theory of sampling, that is, by assum- 
ing that each variate of the parent population is equal to either zero or 
one, and that. denotes the proportion of the_s variates that have 








EDITORIAL 113 


unit value. The moments of the parent population are found as follows: 


TABLE V 


Parent Population for Hypergeometric (and Binomial) Series 


C1)" p"(/-p)s 
p(/-p)”-s 


P(/-p)s{(/-p)"+(-1)* p™} 


Therefore 


C17) py = PU-PY-PY'HIY p= pa fq He/y po}, 


where (p+ q = /) 


In numerical problems this formula should be used ordinarily as 
it stands, although for algebraic purposes we may use frequently the 
forms 


Mix =O 

Mae = PY =P(l-p) 

Més2 = PY (¢* p*)=p('-pX/-2p) 
Mee =PI (q**p*)=pl(l-p)('-3p+3p*) 


ete. 


Using formulae 2, . . ., we may write the moments for the 
hypergeometric series as follows: 


Mes = Skee {A.-e} 
Asia =Shés x { 7-52, + 24} 


etc. 








114 FUNDAMENTALS OF SAMPLING 


or if one prefers 


r rp 
Hay P| 5 - Fas | 


. (2) (3) 3 
Msa= SPY (G*-PNS-3 Sa +2 Gas 





» pte) «3) (4) 
Mae® SPT(G*tPNZ-7 Bon +12 oy - 6-5) 


(9) (4) 
+3s*p*q* { $@) ~@ at aon | 


etc. 


These will be found equivalent to those given by Pearson*, namely 


- &4(Ss+¢5+P) 
Ke s*(s-/) 


- & B(S+4\S+O\N5+2e)(5+24) 
Fs $3(5-/\s-2) 
Mm sm S3+m 4 3 ' 
M, = 55- Ne 5¥5-3)12 5 (3m,+6m,+ /) 
+3357(m,m,+2m/+2m,) 
+3s m,(m,+6m,)t/6 mi |} 


where 
a=-pP ~ A =-ps 


m, = &+8 Mm, = 48 


II. SAMPLING FROM AN UNLIMITED SUPPLY 


Referring to the formula of the first part of this paper, we observe 
that as 8 approaches infinity, » remaining finite, 





*Lond., Edinburgh and Dublin Phil: Mag., Jan.-June, 1899, page 236:° 








EDITORIAL 
M, =rM; 
Mag= P22 
Ms,> is:2 
Moa™ Ug et IP ue 
Me.c= Pst OP Myc e-n 
Merge Ug t EP yg Ug gt lOrr ss +15 ru? , 


Megg™ Phat BIT ge e.g FIO Pl ee Mae 
+/05 yim 16 Mis 


Ma:® THeet 26 Uy g Mag? SOP yg My gt SIMUL, 


+2/0 ae ant 280 Pie Meet 106 r™ ae 
From these the following equations may be obtained: 


Mey = "Kaz 

Me:a*"s: 2 

ae Sh 24> (Mere ~SHie| 

Hosy-!O My are = (Mee /OM six Me ah 

Me: ~!F tase Marg /OMs 2 t+ 30M? * r{ Mex” /S bye Me: 
10 235.4+ 303, «| 

My, g 2" tg .y Ma:g~ F5 bhgig My-g + F/O by. g bs.g * (4, 
2 gee by: 2 99 Mg:e My. * 2/0 fy. x Mi. ef 

Ma:g~ "8 bbg.s z-g~ FO fbg:¢ Ma.g~ FF MG .g + $20 fgg His 
+560 5, ieg~ 630 U2,* 1 [ig e 2O Mee Ms. 
~ 56 fo Meg FFU. p+ 420 J, og fhe: 
+560 Mighi:2~ S50 His } 


In terms: of the standard moments of the distributions these 
equations become 








116 


(20) 





FUNDAMENTALS OF SAMPLING 


/ 
OF 5.4 P% a 


bg, p~/5 064, 3— Oph gt 30=4,{ 06, 15 Oy, 00,2, + BO | 

Oy. 2 21g .g- FD, 9 My.g + LIOK,. = Sh % 5.2 C/G 5 
~ 35044, 2 %,. 2+ 2/005, 2} 

Os.g 28H, ~ 56K, , Ob, -F5HS +4204, +5645.,-630 


Fo { cig 2 2805.56, Oy, IS hy, gt 4200, «+ 563 ,- 630} 


If, without reference to subscripts, we write 


Az= Me 

A= Hs 

Ast Me SHI 

Ag = Hy 10 Uae, 

Ag= Me IF yo, -/0KZ + 30m? 

Ay = fy - 24 py My - 95 f4, 4, +2/0ps, ps3 

Ag? Me~ 28, Uy ~56 Ms Uy - 3543 + 420, 2 


+560 p$u,- 630 43 


the distribution of samples from an unlimited supply is defined, so far 
as moments through the eighth order are concerned, by the relations 


(22) 


M,-7M, 


Naif PA oie 


Working along a different line of approach, Thiele was the first 


to realize the importance of these 2 functions. He made an extensive 
study of their unusual properties and was thus both directly and in- 
directly responsible for many important contributions to the theory of 


ET 





EDITORIAL 117 


mathematical statistics. These values of A, are the so-called “semi- 
Invariants of Thiele.’ 


Again, we may write 


%* %s 
(23)| % = %- 3 
7, * A4,-l0a, 


%= 4 -/5H,-10%5 +30 
% = 4, 2) b,- 354, 0+ 2/0 4, 
J, = %, -28h, -56 4, , - 5506 9+4200,+56083 -630 


and observe that the shape of the distribution of samples is determined 
by the relation 


(24) Yous HET ; Tau 


which follows from equations (20). 


The values yj; are referred to as the ‘standardized semi- 
invariants of Thiele.” 


If now 7 be permitted to approach infinity as a limit, we observe 
that in this limiting situation the shape of the distribution of samples 
is entirely independent of the shape of the parent population, since 

Lim oq” 0 
that is 
Ao,_= 9 
w%,,,- 3 =0 
@,.,-/0d,.,= 90 
ee [54 -/0 @,* 2130 =O 


etc. 


Thus the limiting distribution, which is called “the Normal Curve,” 
must have the following properties : 








118 


(25) 


FUNDAMENTALS OF SAMPLING 
¥y:, = 0 
W,,, = 1-3 
sg 79 
Weg = 1°3°5 
W,._ = 0 
e.g = 1°3-5°7 


THE THEOREM OF BERNOULLI 


If p denotes the probability that an event will happen in a single 
trial and ¢g = 1 ->p the probability that it will not happen in that 
trial, then the probability that the event will happen exactly z times 


during 7 trials is, by Bernoulli’s Theorem 


(26) 


Byix= (2) 9" * P* 


From our point of view we need only regard the problem as one 
of sampling in which we withdraw samples of 7” variates from an 
infinite parent population, in which ,as per Table V, p designates the 
proportion of the variates which are zero in magnitude—the remaining 


variates being of unit magnitude. Then since 


Max PT{ Z""+(-1)"p*'} 


we see from formulae (18) that 


(27) 


M, = rp 

Mae = Pg 

Mee = rpq|q?-p*} 

Has= TPG (G*tp?} Sr piq* 

Bae= rPg{7*- p+ 10 rpg *{q?-ps] 


etc. 








EDITORIAL 119 
Potsson’s EXPONENTIAL BINOMIAL Litt 


If the probability that each of 1000 individuals die in one year 
were .5, then the expected number of deaths in such a group for one 
year would be 500. On the other hand, if the probability that each of 
10,000 die in the year were .05 then the expected number of deaths 
would also be 500. Again 7=100000 and p =.005 or r = 1000000 
and p~p = .0005 would give the same value. If we continue after this 
fashion to let » approach infinity and p zero, but in such a manner 
that the product rp=M_ remains constant, then it can be shown quite 
readily that (26) becomes 


ei? 
(28) 11, By.2 ~— 
rp 2M 


This is known as Poisson’s Exponential Binomial Limit. For a 
Poisson distribution it follows from (27) that 


Mas = M, 
My, = M, 
Ms. = M,+ 3M; 
(29) ) uy, = M+ 10M? 
M,+ 25M2+ 15M? 
Ly, = M,+ 56N3+/05MZ 
Me, = M,+/19M2+409M; +/05Mé 


e 
tt 


Substituting these values back in the definitions of the semi-in- 
variants (formulae 21), we observe that for a Poisson distribution 


(30) Ana=™, ( z=2, 3, ce a ae ; 8) 


DIscUSSION OF RESULTS 


So far as T know, no general method has been worked out which 
will permit one to express complex summations, such as those on pages 











120 FUNDAMENTALS OF SAMPLING 


103, 104, in terms of moments. Moreover, I am unable at present to 

justify the use of the “sampling polynomials” for the moments of the 

samples of an order higher than the eighth. Laborious computations - 
have established the fact that the apparent law of the sampling poly- 

nomials holds for the first eight moments, and hence we have a simple 

method at our disposal of writing down expressions for these moments 

of samples withdrawn from finite parent populations. A study of these 

sampling polynomials should reveal an entirely different approach to 

the problem. This is but one of many interesting problems ef math- 

ematical statistics that require further investigation. 


Although we utilized the results of sampling from a limited supply 
to obtain corresponding formulae for sampling from an unlimited sup- 
ply, nevertheless it can be shown that for g=@ a simple method exists 
for expressing the moments in terms of the moments, 4<,,.. , aS in 
formulae (18).' Moreover, this law holds for any positive integer, N . 








Thus 
me! 20! 
Mézo:a= Zor” Masa iB, 2) ru Unet ot 
aol 
* “S/7141 "ian Maw blew coil a 
2o!/ 2 o 
6 6 © cee r®? i . - 
iaiv(ae "rat 


Since formulae, such as (3a) and (4a) are based on multinomial 
considerations, the rule for writing down the values of ,<,,, is valid 
for any value of 2, when $=@:. 


Proceeding after this fashion, one can show that corresponding 
to formulae (25) one can write for the limiting distribution, referred 
to as the Normal Curve, — 


Ob ener:g = O 

31 ) 
ben:eam l2n 
2”"(n4 


And since the function 


-_ 





EDITORLAL 129 


/ 2 
(32) y == =e?} 


Ver 


satishics the above conditions, we say that (32) is the equation of the 
Normal Curve. In the Theory of Least Squares this equation is usu- 
ally* developed on the so-called Hagen’s hypothesis, that is “An error 
is the alzpebraic sum of an indefinitely great number of small elementary 
errors which are all equal, and each of which is equally likely to be 
posilive or negative.” 


From the results that we have obtained it appears that it is not 
necessary to impose the restrictions that the elementary errors are all 
equal and that positive and negative values are equally likely. It is 
necessary only that 


(1) the number of elementary errors be infinite, although of an 
wrder less than that of the nuniber of errors in the parent population. 


(2) the errors be independent. This restriction is really involved 
ij our assumption that in evaluating summations, each of the 6 vari- 
ates of the parent population occurs exactly as many times as every 
other variate. 


Otherwise, the limiting shape of the distribution of samples is in- 
dependent of the shape of the parent distribution. The fact that tables 
I and [V, arising from parent distributions that are so extremely 
abnormal, exhibit distributions of samples that are fairly normal, seems 
to bear vut our point in spite of the fact that we employed in each 
instance a small value of 7 , i. e. twenty-five. 


*See Merriman’s Method of Least Squares. John Wiley and Sons, New York City. 





